Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models
About
Recent advances in large language models have led to numerous task-specialized fine-tuned variants, creating a need for efficient model merging techniques that preserve specialized capabilities while avoiding costly retraining. While existing task vector-based merging methods show promise, they typically apply uniform coefficients across all parameters, overlooking varying parameter importance both within and across tasks. We present Sens-Merging, a sensitivity-guided coefficient adjustment method that enhances existing model merging techniques by operating at both task-specific and cross-task levels. Our method analyzes parameter sensitivity within individual tasks and evaluates cross-task transferability to determine optimal merging coefficients. Extensive experiments on Mistral 7B and LLaMA2-7B/13B models demonstrate that Sens-Merging significantly improves performance across general knowledge, mathematical reasoning, and code generation tasks. Notably, when combined with existing merging techniques, our method enables merged models to outperform specialized fine-tuned models, particularly in code generation tasks. Our findings reveal important trade-offs between task-specific and cross-task scalings, providing insights for future model merging strategies.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | HumanEval | -- | 850 | |
| Mathematical Reasoning | MATH | Accuracy17.06 | 535 | |
| General Knowledge | MMLU | MMLU General Knowledge Accuracy62.43 | 170 | |
| Code Generation | MBPP | Accuracy55.1 | 120 | |
| Truthful QA | Truthful QA | Accuracy48.71 | 83 | |
| Code Generation | HumanEval and MBPP | -- | 30 | |
| Mathematical Reasoning | GSM8K and MATH | GSM8K Score55.42 | 27 | |
| General Knowledge | HellaSwag | Accuracy0.6194 | 13 | |
| General Knowledge | MMLU, HellaSwag, TruthfulQA | MMLU55.88 | 9 | |
| General Performance | Aggregated MMLU, HellaSwag, TruthfulQA, GSM8K, MATH, MBPP, HumanEval | Average Score40.35 | 9 |