Defending Unauthorized Model Merging via Dual-Stage Weight Protection

About

The rapid proliferation of pretrained models and open repositories has made model merging a convenient yet risky practice, allowing free-riders to combine fine-tuned models into a new multi-capability model without authorization. Such unauthorized model merging not only violates intellectual property rights but also undermines model ownership and accountability. To address this issue, we present MergeGuard, a proactive dual-stage weight protection framework that disrupts merging compatibility while maintaining task fidelity. In the first stage, we redistribute task-relevant information across layers via L2-regularized optimization, ensuring that important gradients are evenly dispersed. In the second stage, we inject structured perturbations to misalign task subspaces, breaking curvature compatibility in the loss landscape. Together, these stages reshape the model's parameter geometry such that merged models collapse into destructive interference while the protected model remains fully functional. Extensive experiments on both vision (ViT-L-14) and language (Llama2, Gemma2, Mistral) models demonstrate that MergeGuard reduces merged model accuracy by up to 90% with less than 1.5% performance loss on the protected model.

Wei-Jia Chen, Min-Yen Tsai, Cheng-Yi Lee, Chia-Mu Yu• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	EuroSAT	Accuracy97.4	569
Classification	Cars	Accuracy90.3	492
Image Classification	DTD	Accuracy82.16	487
Image Classification	RESISC45	Accuracy97.25	472
Image Classification	SUN397	Accuracy81.52	450
Image Classification	GTSRB	Accuracy98.25	291
Image Classification	MNIST	Accuracy99.27	70
Image Classification	SVHN	Accuracy96.82	47

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord