Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Revisiting Weight Averaging for Model Merging

About

Model merging aims to build a multi-task learner by combining the parameters of individually fine-tuned models without additional training. While a straightforward approach is to average model parameters across tasks, this often results in suboptimal performance due to interference among parameters across tasks. In this paper, we present intriguing results that weight averaging implicitly induces task vectors centered around the weight averaging itself and that applying a low-rank approximation to these centered task vectors significantly improves merging performance. Our analysis shows that centering the task vectors effectively reduces task interference and most of task-specific knowledge is concentrated in the top singular vectors. Our method demonstrates robust and scalable performance on vision benchmarks across varying numbers of tasks and model sizes. Furthermore, we observe that our approach is applicable to natural language processing tasks with competitive performance.

Jiho Choi, Donggyun Kim, Chanhyuk Lee, Seunghoon Hong• 2024

Related benchmarks

TaskDatasetResultRank
Image Classification20 Vision Classification Tasks
Average Accuracy87.9
94
Image Classification14 Vision Tasks
Average Accuracy88.7
84
Image ClassificationVision Datasets (14 tasks) 1.0 (test)
Average Accuracy88
67
Visual Classification8 Vision Tasks (SUN397, Stanford Cars, RESISC45, EuroSAT, SVHN, GTSRB, MNIST, DTD)
Average Accuracy92.6
60
Text ClassificationGLUE
Average Score69.97
28
Multi-task image classificationVision Benchmark 20-task
Average Accuracy87.9
24
Multi-task image classification8-task vision benchmark
Average Accuracy92.6
24
Showing 7 of 7 rows

Other info

Follow for update