LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
About
Fine-tuning pre-trained Large Language Models (LLMs) for specialized tasks incurs substantial computational and data costs. While model merging offers a training-free solution to integrate multiple task-specific models, existing methods suffer from safety-utility conflicts where enhanced general capabilities degrade safety safeguards. We identify two root causes: $\textbf{neuron misidentification}$ due to simplistic parameter magnitude-based selection, and $\textbf{cross-task neuron interference}$ during merging. To address these challenges, we propose $\textbf{LED-Merging}$, a three-stage framework that $\textbf{L}$ocates task-specific neurons via gradient-based attribution, dynamically $\textbf{E}$lects critical neurons through multi-model importance fusion, and $\textbf{D}$isjoints conflicting updates through parameter isolation. Extensive experiments on Llama-3-8B, Mistral-7B, and Llama2-13B demonstrate that LED-Merging effectively reduces harmful response rates, showing a 31.4\% decrease on Llama-3-8B-Instruct on HarmBench, while simultaneously preserving 95\% of utility performance, such as achieving 52.39\% accuracy on GSM8K. LED-Merging resolves safety-utility conflicts and provides a lightweight, training-free paradigm for constructing reliable multi-task LLMs. Code is available at $\href{https://github.com/MqLeet/LED-Merging}{GitHub}$.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K | Accuracy52.39 | 983 | |
| Mathematical Reasoning | MATH | Accuracy16.12 | 643 | |
| Mathematical Reasoning | AIME | AIME Accuracy26.67 | 283 | |
| Science Question Answering | ARC Challenge | Accuracy34.58 | 234 | |
| Code Generation | HumanEval | Pass@145.12 | 108 | |
| Science Question Answering | ARC Easy | Accuracy35.1 | 101 | |
| Safety Alignment | HarmBench | ASR4 | 88 | |
| Code Generating | MBPP | Pass@147.2 | 88 | |
| Code Generation | LiveCodeBench | Pass@119.27 | 86 | |
| Reasoning | GSM8K | -- | 83 |