To See a World in a Spark of Neuron: Disentangling Multi-task Interference for Training-free Model Merging

About

Fine-tuning pre-trained models on targeted datasets enhances task-specific performance but often comes at the expense of generalization. Model merging techniques, which integrate multiple fine-tuned models into a single multi-task model through task arithmetic, offer a promising solution. However, task interference remains a fundamental challenge, leading to performance degradation and suboptimal merged models. Existing approaches largely overlooked the fundamental roles of neurons, their connectivity, and activation, resulting in a merging process and a merged model that does not consider how neurons relay and process information. In this work, we present the first study that relies on neuronal mechanisms for model merging. Specifically, we decomposed task-specific representations into two complementary neuronal subspaces that regulate input sensitivity and task adaptability. Leveraging this decomposition, we introduced NeuroMerging, a novel merging framework developed to mitigate task interference within neuronal subspaces, enabling training-free model fusion across diverse tasks. Through extensive experiments, we demonstrated that NeuroMerging achieved superior performance compared to existing methods on multi-task benchmarks across both natural language and vision domains. Our findings highlighted the importance of aligning neuronal mechanisms in model merging, offering new insights into mitigating task interference and improving knowledge fusion. Our project is available at https://ZzzitaoFang.github.io/projects/NeuroMerging/.

Zitao Fang, Guodong DU, Shuyang Yu, Yifei Guo, Yiwei Zhang, Yiyao Cao, Jing Li, Ho-Kin Tang, Sim Kuan Goh• 2025

Related benchmarks

Task	Dataset	Result
Multimodal Understanding	MMStar	Accuracy34.58	511
Multimodal Understanding	SEEDBench2 Plus	Accuracy40.84	138
Multimodal Understanding	MMMU	Accuracy36.89	34
Multilingual Multimodal Multiple-Choice Question Answering	Afri-MCQA	Average Accuracy23.67	15
Visual Question Answering	CVQA	--	14
Multimodal Understanding	XMMMU	Avg_mul33.47	11
Multilingual Visual Question Answering	MaXM	Avg. Score (MaXM)20.53	11
Multicultural Visual Reasoning	MaRVL	Avg_mul Score49.84	10
Visual Question Answering	xGQA	Avg_mul Score20.98	10

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord