Revisiting Weight Averaging for Model Merging

About

Model merging aims to build a multi-task learner by combining the parameters of individually fine-tuned models without additional training. While a straightforward approach is to average model parameters across tasks, this often results in suboptimal performance due to interference among parameters across tasks. In this paper, we present intriguing results that weight averaging implicitly induces task vectors centered around the weight averaging itself and that applying a low-rank approximation to these centered task vectors significantly improves merging performance. Our analysis shows that centering the task vectors effectively reduces task interference and most of task-specific knowledge is concentrated in the top singular vectors. Our method demonstrates robust and scalable performance on vision benchmarks across varying numbers of tasks and model sizes. Furthermore, we observe that our approach is applicable to natural language processing tasks with competitive performance.

Jiho Choi, Donggyun Kim, Chanhyuk Lee, Seunghoon Hong• 2024

Related benchmarks

Task	Dataset	Result
Image Classification	20 Vision Classification Tasks	Average Accuracy87.9	131
Image Classification	14 Vision Tasks	Average Accuracy88.7	121
Visual Classification	8 Vision Tasks (SUN397, Stanford Cars, RESISC45, EuroSAT, SVHN, GTSRB, MNIST, DTD)	Average Accuracy92.6	86
Image Classification	Vision Datasets (14 tasks) 1.0 (test)	Average Accuracy88	67
Text Classification	GLUE	Average Score69.97	28
Multi-task image classification	Vision Benchmark 20-task	Average Accuracy87.9	24
Multi-task image classification	8-task vision benchmark	Average Accuracy92.6	24

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord