Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging
About
Model merging has emerged as a promising paradigm for enabling multi-task capabilities without additional training. However, existing methods often experience substantial performance degradation compared with individually fine-tuned models, even on similar tasks, underscoring the need to preserve task-specific information. This paper proposes Decomposition, Thresholding, and Scaling (DTS), an approximation-based personalized merging framework that preserves task-specific information with minimal storage overhead. DTS first applies singular value decomposition to the task-specific information and retains only a small subset of singular values and vectors. It then introduces a novel thresholding strategy that partitions singular vector elements into groups and assigns a scaling factor to each group. To enable generalization to unseen tasks, we further extend DTS with a variant that fuses task-specific information in a data-free manner based on the semantic similarity of task characteristics. Extensive experiments demonstrate that DTS consistently outperforms state-of-the-art baselines while requiring only 1\% additional storage per task. Furthermore, experiments on unseen tasks show that the DTS variant achieves significantly better generalization performance. Our code is available at https://github.com/krumpguo/DTS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Bias Evaluation | BBQ | Accuracy87.3 | 99 | |
| Multi-task Language Understanding | MMLU | Accuracy68.32 | 87 | |
| Image Classification | Vision Multi-task Suite (SUN397, Cars, RESISC45, EuroSAT, SVHN, GTSRB, MNIST, DTD) | Average Accuracy94.24 | 72 | |
| Image Classification | SUN397, Cars, EuroSAT, GTSRB, MNIST, DTD Seen Tasks (test) | SUN397 Accuracy0.8182 | 34 | |
| Image Classification | RESISC45, SVHN Unseen Tasks (test) | RESISC45 Accuracy72.98 | 34 | |
| Visual Classification | 8 Vision Tasks (SUN397, Stanford Cars, RESISC45, EuroSAT, SVHN, GTSRB, MNIST, DTD) | SUN397 Accuracy74.15 | 20 | |
| Natural Language Understanding | GLUE | CoLA76.98 | 16 | |
| Natural Language Understanding | GLUE RoBERTa-base (val) | CoLA Score59.71 | 16 | |
| Natural Language Understanding | GLUE | CoLA76.98 | 14 | |
| Question Answering | MMLU, TruthfulQA, and BBQ | MMLU Accuracy68.32 | 14 |