Decomposing Task Vectors for Refined Model Editing

About

Large pre-trained models have transformed machine learning, yet adapting these models effectively to exhibit precise, concept-specific behaviors remains a significant challenge. Task vectors, defined as the difference between fine-tuned and pre-trained model parameters, provide a mechanism for steering neural networks toward desired behaviors. This has given rise to large repositories dedicated to task vectors tailored for specific behaviors. The arithmetic operation of these task vectors allows for the seamless combination of desired behaviors without the need for large datasets. However, these vectors often contain overlapping concepts that can interfere with each other during arithmetic operations, leading to unpredictable outcomes. We propose a principled decomposition method that separates each task vector into two components: one capturing shared knowledge across multiple task vectors, and another isolating information unique to each specific task. By identifying invariant subspaces across projections, our approach enables more precise control over concept manipulation without unintended amplification or diminution of other behaviors. We demonstrate the effectiveness of our decomposition method across three domains: improving multi-task merging in image classification by 5% using shared components as additional task vectors, enabling clean style mixing in diffusion models without generation degradation by mixing only the unique components, and achieving 47% toxicity reduction in language models while preserving performance on general knowledge tasks by negating the toxic information isolated to the unique component. Our approach provides a new framework for understanding and controlling task vector arithmetic, addressing fundamental limitations in model editing operations.

Hamed Damirchi, Ehsan Abbasnejad, Zhen Zhang, Javen Shi• 2025

Related benchmarks

Task	Dataset	Result
Language Understanding	MMLU	Accuracy39.7	844
Reasoning	BBH	Accuracy37.8	770
Image Classification	Stanford Cars	Accuracy71.8	705
Image Classification	EuroSAT	Accuracy94.7	569
Image Classification	RESISC45	Accuracy83.7	539
Image Classification	DTD	Accuracy65.2	487
Image Classification	SUN397	Accuracy65.1	450
Image Classification	MNIST	Accuracy97.1	398
Image Classification	SVHN	Accuracy86.4	395
Image Classification	GTSRB	Accuracy95.9	291

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord