LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint

About

Fine-tuning pre-trained Large Language Models (LLMs) for specialized tasks incurs substantial computational and data costs. While model merging offers a training-free solution to integrate multiple task-specific models, existing methods suffer from safety-utility conflicts where enhanced general capabilities degrade safety safeguards. We identify two root causes: $\textbf{neuron misidentification}$ due to simplistic parameter magnitude-based selection, and $\textbf{cross-task neuron interference}$ during merging. To address these challenges, we propose $\textbf{LED-Merging}$, a three-stage framework that $\textbf{L}$ocates task-specific neurons via gradient-based attribution, dynamically $\textbf{E}$lects critical neurons through multi-model importance fusion, and $\textbf{D}$isjoints conflicting updates through parameter isolation. Extensive experiments on Llama-3-8B, Mistral-7B, and Llama2-13B demonstrate that LED-Merging effectively reduces harmful response rates, showing a 31.4\% decrease on Llama-3-8B-Instruct on HarmBench, while simultaneously preserving 95\% of utility performance, such as achieving 52.39\% accuracy on GSM8K. LED-Merging resolves safety-utility conflicts and provides a lightweight, training-free paradigm for constructing reliable multi-task LLMs. Code is available at $\href{https://github.com/MqLeet/LED-Merging}{GitHub}$.

Qianli Ma, Dongrui Liu, Qian Chen, Linfeng Zhang, Jing Shao• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy52.39	1398
Mathematical Reasoning	MATH	Accuracy16.12	882
Science Question Answering	ARC Challenge	Accuracy34.58	354
Mathematical Reasoning	AIME	AIME Accuracy26.67	288
Code Generation	HumanEval	Pass@145.12	171
Science Question Answering	ARC Easy	Accuracy35.1	162
Knowledge	MMLU	Accuracy80.83	161
Safety Evaluation	HarmBench	Harmbench Score2	127
General Knowledge Evaluation	MMLU	MMLU Accuracy71.93	127
Reasoning	GSM8K	--	111

Showing 10 of 46 rows

Other info

Follow for update

@wizwand_team Discord