model-merging
Merge multiple fine-tuned models using mergekit to combine capabilities without retraining. Use when creating specialized models by blending domain-specific expertise (math + coding + chat), improving performance beyond single models, or experimenting rapidly with model variants. Covers SLERP, TIES-Merging, DARE, Task Arithmetic, linear merging, and production deployment strategies.
Install
mkdir -p .claude/skills/model-merging && curl -L -o skill.zip "https://mcp.directory/api/skills/download/6796" && unzip -o skill.zip -d .claude/skills/model-merging && rm skill.zipInstalls to .claude/skills/model-merging
About this skill
Model Merging: Combining Pre-trained Models
When to Use This Skill
Use Model Merging when you need to:
- Combine capabilities from multiple fine-tuned models without retraining
- Create specialized models by blending domain-specific expertise (math + coding + chat)
- Improve performance beyond single models (often +5-10% on benchmarks)
- Reduce training costs - no GPUs needed, merges run on CPU
- Experiment rapidly - create new model variants in minutes, not days
- Preserve multiple skills - merge without catastrophic forgetting
Success Stories: Marcoro14-7B-slerp (best on Open LLM Leaderboard 02/2024), many top HuggingFace models use merging
Tools: mergekit (Arcee AI), LazyMergekit, Model Soup
Installation
# Install mergekit
git clone https://github.com/arcee-ai/mergekit.git
cd mergekit
pip install -e .
# Or via pip
pip install mergekit
# Optional: Transformer library
pip install transformers torch
Quick Start
Simple Linear Merge
# config.yml - Merge two models with equal weights
merge_method: linear
models:
- model: mistralai/Mistral-7B-v0.1
parameters:
weight: 0.5
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
weight: 0.5
dtype: bfloat16
# Run merge
mergekit-yaml config.yml ./merged-model --cuda
# Use merged model
python -m transformers.models.auto --model_name_or_path ./merged-model
SLERP Merge (Best for 2 Models)
# config.yml - Spherical interpolation
merge_method: slerp
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1
layer_range: [0, 32]
- model: teknium/OpenHermes-2.5-Mistral-7B
layer_range: [0, 32]
parameters:
t: 0.5 # Interpolation factor (0=model1, 1=model2)
dtype: bfloat16
Core Concepts
1. Merge Methods
Linear (Model Soup)
- Simple weighted average of parameters
- Fast, works well for similar models
- Can merge 2+ models
merged_weights = w1 * model1_weights + w2 * model2_weights + w3 * model3_weights
# where w1 + w2 + w3 = 1
SLERP (Spherical Linear Interpolation)
- Interpolates along sphere in weight space
- Preserves magnitude of weight vectors
- Best for merging 2 models
- Smoother than linear
# SLERP formula
merged = (sin((1-t)*θ) / sin(θ)) * model1 + (sin(t*θ) / sin(θ)) * model2
# where θ = arccos(dot(model1, model2))
# t ∈ [0, 1]
Task Arithmetic
- Extract "task vectors" (fine-tuned - base)
- Combine task vectors, add to base
- Good for merging multiple specialized models
# Task vector
task_vector = finetuned_model - base_model
# Merge multiple task vectors
merged = base_model + α₁*task_vector₁ + α₂*task_vector₂
TIES-Merging
- Task arithmetic + sparsification
- Resolves sign conflicts in parameters
- Best for merging many task-specific models
DARE (Drop And REscale)
- Randomly drops fine-tuned parameters
- Rescales remaining parameters
- Reduces redundancy, maintains performance
2. Configuration Structure
# Basic structure
merge_method: <method> # linear, slerp, ties, dare_ties, task_arithmetic
base_model: <path> # Optional: base model for task arithmetic
models:
- model: <path/to/model1>
parameters:
weight: <float> # Merge weight
density: <float> # For TIES/DARE
- model: <path/to/model2>
parameters:
weight: <float>
parameters:
# Method-specific parameters
dtype: <dtype> # bfloat16, float16, float32
# Optional
slices: # Layer-wise merging
tokenizer: # Tokenizer configuration
Merge Methods Guide
Linear Merge
Best for: Simple model combinations, equal weighting
merge_method: linear
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
weight: 0.4
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
weight: 0.3
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
parameters:
weight: 0.3
dtype: bfloat16
SLERP Merge
Best for: Two models, smooth interpolation
merge_method: slerp
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1
layer_range: [0, 32]
- model: teknium/OpenHermes-2.5-Mistral-7B
layer_range: [0, 32]
parameters:
t: 0.5 # 0.0 = first model, 1.0 = second model
dtype: bfloat16
Layer-specific SLERP:
merge_method: slerp
slices:
- sources:
- model: model_a
layer_range: [0, 32]
- model: model_b
layer_range: [0, 32]
parameters:
t:
- filter: self_attn # Attention layers
value: 0.3
- filter: mlp # MLP layers
value: 0.7
- value: 0.5 # Default for other layers
dtype: bfloat16
Task Arithmetic
Best for: Combining specialized skills
merge_method: task_arithmetic
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1 # Math
parameters:
weight: 0.5
- model: teknium/OpenHermes-2.5-Mistral-7B # Chat
parameters:
weight: 0.3
- model: ajibawa-2023/Code-Mistral-7B # Code
parameters:
weight: 0.2
dtype: bfloat16
TIES-Merging
Best for: Many models, resolving conflicts
merge_method: ties
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
density: 0.5 # Keep top 50% of parameters
weight: 1.0
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
density: 0.5
weight: 1.0
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
parameters:
density: 0.5
weight: 1.0
parameters:
normalize: true
dtype: bfloat16
DARE Merge
Best for: Reducing redundancy
merge_method: dare_ties
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
density: 0.5 # Drop 50% of deltas
weight: 0.6
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
density: 0.5
weight: 0.4
parameters:
int8_mask: true # Use int8 for masks (saves memory)
dtype: bfloat16
Advanced Patterns
Layer-wise Merging
# Different models for different layers
merge_method: passthrough
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1
layer_range: [0, 16] # First half
- sources:
- model: teknium/OpenHermes-2.5-Mistral-7B
layer_range: [16, 32] # Second half
dtype: bfloat16
MoE from Merged Models
# Create Mixture of Experts
merge_method: moe
base_model: mistralai/Mistral-7B-v0.1
experts:
- source_model: WizardLM/WizardMath-7B-V1.1
positive_prompts:
- "math"
- "calculate"
- source_model: teknium/OpenHermes-2.5-Mistral-7B
positive_prompts:
- "chat"
- "conversation"
- source_model: ajibawa-2023/Code-Mistral-7B
positive_prompts:
- "code"
- "python"
dtype: bfloat16
Tokenizer Merging
merge_method: linear
models:
- model: mistralai/Mistral-7B-v0.1
- model: custom/specialized-model
tokenizer:
source: "union" # Combine vocabularies from both models
tokens:
<|special_token|>:
source: "custom/specialized-model"
Best Practices
1. Model Compatibility
# ✅ Good: Same architecture
models = [
"mistralai/Mistral-7B-v0.1",
"teknium/OpenHermes-2.5-Mistral-7B", # Both Mistral 7B
]
# ❌ Bad: Different architectures
models = [
"meta-llama/Llama-2-7b-hf", # Llama
"mistralai/Mistral-7B-v0.1", # Mistral (incompatible!)
]
2. Weight Selection
# ✅ Good: Weights sum to 1.0
models:
- model: model_a
parameters:
weight: 0.6
- model: model_b
parameters:
weight: 0.4 # 0.6 + 0.4 = 1.0
# ⚠️ Acceptable: Weights don't sum to 1 (for task arithmetic)
models:
- model: model_a
parameters:
weight: 0.8
- model: model_b
parameters:
weight: 0.8 # May boost performance
3. Method Selection
# Choose merge method based on use case:
# 2 models, smooth blend → SLERP
merge_method = "slerp"
# 3+ models, simple average → Linear
merge_method = "linear"
# Multiple task-specific models → Task Arithmetic or TIES
merge_method = "ties"
# Want to reduce redundancy → DARE
merge_method = "dare_ties"
4. Density Tuning (TIES/DARE)
# Start conservative (keep more parameters)
parameters:
density: 0.8 # Keep 80%
# If performance good, increase sparsity
parameters:
density: 0.5 # Keep 50%
# If performance degrades, reduce sparsity
parameters:
density: 0.9 # Keep 90%
5. Layer-specific Merging
# Preserve base model's beginning and end
merge_method: passthrough
slices:
- sources:
- model: base_model
layer_range: [0, 2] # Keep first layers
- sources:
- model: merged_middle # Merge middle layers
layer_range: [2, 30]
- sources:
- model: base_model
layer_range: [30, 32] # Keep last layers
Evaluation & Testing
Benchmark Merged Models
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load merged model
model = AutoModelForCausalLM.from_pretrained("./merged-model")
tokenizer = AutoTokenizer.from_pretrained("./merged-model")
# Test on various tasks
test_prompts = {
"math": "Calculate: 25 * 17 =",
"code": "Write a Python function to reverse a string:",
"chat": "What is the capital of France?",
}
for task, prompt in test_prompts.items():
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(f"{task}: {tokenizer.decode(outputs[0])}")
Common Benchmarks
- Open LLM Leaderboard: General capabilities
- MT-Bench: Multi-turn conversation
- MMLU: Multitask accuracy
- HumanEval: Code generation
- GSM8K: Math reasoning
Production Deployment
Save and Upload
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load merged model
model = AutoModelForCaus
---
*Content truncated.*
More by davila7
View all skills by davila7 →You might also like
flutter-development
aj-geddes
Build beautiful cross-platform mobile apps with Flutter and Dart. Covers widgets, state management with Provider/BLoC, navigation, API integration, and material design.
drawio-diagrams-enhanced
jgtolentino
Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams, cross-functional flowcharts, org charts, network diagrams, UML diagrams, BPMN, project management diagrams (WBS, Gantt, PERT, RACI), risk matrices, stakeholder maps, or any other visual diagram in draw.io format. This skill includes access to custom shape libraries for icons, clipart, and professional symbols.
ui-ux-pro-max
nextlevelbuilder
"UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 8 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient."
godot
bfollington
This skill should be used when working on Godot Engine projects. It provides specialized knowledge of Godot's file formats (.gd, .tscn, .tres), architecture patterns (component-based, signal-driven, resource-based), common pitfalls, validation tools, code templates, and CLI workflows. The `godot` command is available for running the game, validating scripts, importing resources, and exporting builds. Use this skill for tasks involving Godot game development, debugging scene/resource files, implementing game systems, or creating new Godot components.
nano-banana-pro
garg-aayush
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
fastapi-templates
wshobson
Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.
Related MCP Servers
Browse all serversBoost productivity with Task Master: an AI-powered tool for project management and agile development workflows, integrat
Scan your website for viruses and vulnerabilities with Code Audit (Ollama). Get a comprehensive site scanner virus check
AI Vision MCP Server enables AI image analysis and video content analysis with Google Gemini & Vertex AI—object detectio
Unlock AI-ready web data with Firecrawl: scrape any website, handle dynamic content, and automate web scraping for resea
Enhance software testing with Playwright MCP: Fast, reliable browser automation, an innovative alternative to Selenium s
Optimize your codebase for AI with Repomix—transform, compress, and secure repos for easier analysis with modern AI tool
Stay ahead of the MCP ecosystem
Get weekly updates on new skills and servers.