Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging

About

Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their specialized capabilities across different tasks and domains. Current model merging techniques focus on merging all available models simultaneously, with weight interpolation-based methods being the predominant approaches. However, these conventional approaches are not well-suited for scenarios where models become available sequentially, and they often suffer from high memory requirements and potential interference between tasks. In this study, we propose a training-free projection-based continual merging method that processes models sequentially through orthogonal projections of weight matrices and adaptive scaling mechanisms. Our method operates by projecting new parameter updates onto subspaces orthogonal to existing merged parameter updates while using an adaptive scaling mechanism to maintain stable parameter distances, enabling efficient sequential integration of task-specific knowledge. Our approach maintains constant memory complexity to the number of models, minimizes interference between tasks through orthogonal projections, and retains the performance of previously merged models through adaptive task vector scaling. Extensive experiments on CLIP-ViT models demonstrate that our method achieves a 5-8% average accuracy improvement while maintaining robust performance in different task orderings.

Anke Tang, Enneng Yang, Li Shen, Yong Luo, Han Hu, Bo Du, Dacheng Tao• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH 500	Top-1 Accuracy79.7	384
Scientific Reasoning	GPQA D	Accuracy (%)38	77
Continual Learning	Standard CL Benchmark	Avg Final Acc0.602	71
General Capability	Aggregate (GPQA-D, GSM8K, HumanEval, MATH-500, MBPP, MMLU-Pro)	Average Accuracy72.93	66
Question Answering	GPQA Diamond	Accuracy38	61
Language Understanding	MMLU-Pro	MMLU-Pro Accuracy66.3	60
Continual Model Merging	Sequence 14-task	ACC83.5	57
Continual Learning	Large Number of Tasks	Average Performance50.5	50
Continual Model Merging	sequence 8-task	Accuracy87	33
Continual Model Merging	20-task sequence	Accuracy76	33

Showing 10 of 23 rows

Other info

Follow for update

@wizwand_team Discord