Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MAGIC: Achieving Superior Model Merging via Magnitude Calibration

About

The proliferation of pre-trained models has given rise to a wide array of specialised, fine-tuned models. Model merging aims to merge the distinct capabilities of these specialised models into a unified model, requiring minimal or even no additional training. A core objective of model merging is to ensure the merged model retains the behavioural characteristics of the specialised models, typically achieved through feature alignment. We identify that features consist of two critical components: direction and magnitude. Prior research has predominantly focused on directional alignment, while the influence of magnitude remains largely neglected, despite its pronounced vulnerability to perturbations introduced by common merging operations (e.g., parameter fusion and sparsification). Such perturbations to magnitude inevitably lead to feature deviations in the merged model from the specialised models, resulting in subsequent performance degradation. To address this, we propose MAGnItude Calibration (MAGIC), a plug-and-play framework that rectifies layer-wise magnitudes in feature and weight spaces, with three variants. Specifically, our Feature Space Calibration (FSC) realigns the merged model's features using a small set of unlabelled data, while Weight Space Calibration (WSC) extends this calibration to the weight space without requiring additional data. Combining these yields Dual Space Calibration (DSC). Comprehensive experiments demonstrate that MAGIC consistently boosts performance across diverse Computer Vision tasks (+4.3% on eight datasets) and NLP tasks (+8.0% on Llama) without additional training. Our code is available at: https://github.com/lyymuwu/MAGIC

Yayuan Li, Jian Zhang, Jintao Guo, Zihan Cheng, Lei Qi, Yinghuan Shi, Yang Gao• 2025

Related benchmarks

TaskDatasetResultRank
Instruction FollowingAlpacaEval
Win Rate51.2
125
Image ClassificationSUN397, Cars, RESISC45, EuroSAT, SVHN, GTSRB, MNIST, DTD (test)
SUN39783.3
80
Natural Language ProcessingBERT NLP Task Suite (ANLI, Rotten Tomatoes, CoLA, SMS) (test)
ANLI Accuracy51.5
12
Mathematical ReasoningGSM8K
GSM8K Accuracy51.1
7
Natural Language UnderstandingT0 Evaluation Suite IA3 PEFT (held-out)
RTE71.9
6
Showing 5 of 5 rows

Other info

Follow for update