Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

About

In the era of large language models, model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training. However, two challenges remain: (a) interference between different models and (b) heterogeneous data during testing. Traditional model merging methods often show significant performance gaps compared to fine-tuned models due to these issues. Additionally, a one-size-fits-all model lacks flexibility for diverse test data, leading to performance degradation. We show that both shared and exclusive task-specific knowledge are crucial for merging performance, but directly merging exclusive knowledge hinders overall performance. In view of this, we propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency; (2) dynamically merging shared and task-specific knowledge based on the input. This approach narrows the performance gap between merged and fine-tuned models and improves adaptability to heterogeneous data. Extensive experiments on $20$ datasets for both language and vision tasks demonstrate the effectiveness of our method, showing an average improvement of $28.34\%$ in absolute normalized score for discriminative tasks and even surpassing the fine-tuned upper bound on the generative tasks. Our implementation is available in \url{https://github.com/LZY-the-boys/Twin-Merging}

Zhenyi Lu, Chenghao Fan, Wei Wei, Xiaoye Qu, Dangyang Chen, Yu Cheng• 2024

Related benchmarks

TaskDatasetResultRank
Bias EvaluationBBQ
Accuracy86.46
99
Multi-task Language UnderstandingMMLU
Accuracy68.14
87
Image ClassificationVision Multi-task Suite (SUN397, Cars, RESISC45, EuroSAT, SVHN, GTSRB, MNIST, DTD)
Average Accuracy93.01
72
Image ClassificationSUN397, Cars, EuroSAT, GTSRB, MNIST, DTD Seen Tasks (test)
SUN397 Accuracy0.8195
34
Image ClassificationRESISC45, SVHN Unseen Tasks (test)
RESISC45 Accuracy69.02
34
Visual Classification8 Vision Tasks (SUN397, Stanford Cars, RESISC45, EuroSAT, SVHN, GTSRB, MNIST, DTD)
SUN397 Accuracy71.56
20
Natural Language UnderstandingGLUE RoBERTa-base (val)
CoLA Score59.12
16
Natural Language UnderstandingGLUE
CoLA76.02
16
Natural Language UnderstandingGLUE
CoLA76.27
14
Question AnsweringMMLU, TruthfulQA, and BBQ
MMLU Accuracy68.07
14
Showing 10 of 11 rows

Other info

Follow for update