Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging

About

Model merging has emerged as a promising paradigm for enabling multi-task capabilities without additional training. However, existing methods often experience substantial performance degradation compared with individually fine-tuned models, even on similar tasks, underscoring the need to preserve task-specific information. This paper proposes Decomposition, Thresholding, and Scaling (DTS), an approximation-based personalized merging framework that preserves task-specific information with minimal storage overhead. DTS first applies singular value decomposition to the task-specific information and retains only a small subset of singular values and vectors. It then introduces a novel thresholding strategy that partitions singular vector elements into groups and assigns a scaling factor to each group. To enable generalization to unseen tasks, we further extend DTS with a variant that fuses task-specific information in a data-free manner based on the semantic similarity of task characteristics. Extensive experiments demonstrate that DTS consistently outperforms state-of-the-art baselines while requiring only 1\% additional storage per task. Furthermore, experiments on unseen tasks show that the DTS variant achieves significantly better generalization performance. Our code is available at https://github.com/krumpguo/DTS.

Kuangpu Guo, Yuhe Ding, Jian Liang, Zilei Wang, Ran He• 2025

Related benchmarks

Task	Dataset	Result
Multi-task Language Understanding	MMLU	Accuracy68.32	353
Bias Evaluation	BBQ	Accuracy87.3	171
Image Classification	Vision Multi-task Suite (SUN397, Cars, RESISC45, EuroSAT, SVHN, GTSRB, MNIST, DTD)	Average Accuracy94.24	104
Truthfulness	TruthfulQA	Truthfulness Accuracy54.12	86
Visual Classification	8 Vision Tasks (SUN397, Stanford Cars, RESISC45, EuroSAT, SVHN, GTSRB, MNIST, DTD)	Average Accuracy90.4	86
Image Classification	SUN397, Cars, EuroSAT, GTSRB, MNIST, DTD Seen Tasks (test)	SUN397 Accuracy0.8182	34
Image Classification	RESISC45, SVHN Unseen Tasks (test)	RESISC45 Accuracy72.98	34
Question Answering	MMLU, TruthfulQA, and BBQ	MMLU Accuracy68.32	21
Natural Language Understanding	GLUE	CoLA76.98	16
Natural Language Understanding	GLUE RoBERTa-base (val)	CoLA Score59.71	16

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord