Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks

About

The rapid development of AI systems has been greatly influenced by the emergence of foundation models. A common approach for targeted problems involves fine-tuning these pre-trained foundation models for specific target tasks, resulting in a rapid spread of models fine-tuned across a diverse array of tasks. This work focuses on the problem of merging multiple fine-tunings of the same foundation model derived from a spectrum of auxiliary tasks. We introduce a new simple method, Model Breadcrumbs, which consists of a sparsely defined weight set that guides model adaptation within the weight space of a pre-trained model. These breadcrumbs are constructed by subtracting the weights from a pre-trained model before and after fine-tuning, followed by a sparsification process that eliminates weight outliers and negligible perturbations. Our experiments demonstrate the effectiveness of Model Breadcrumbs to simultaneously improve performance across multiple tasks. This contribution aligns with the evolving paradigm of updatable machine learning, reminiscent of the collaborative principles underlying open-source software development, fostering a community-driven effort to reliably update machine learning models. Our method is shown to be more efficient and unlike previous proposals does not require hyperparameter tuning for each new task added. Through extensive experimentation involving various models, tasks, and modalities we establish that integrating Model Breadcrumbs offers a simple, efficient, and highly effective approach for constructing multi-task models and facilitating updates to foundation models.

MohammadReza Davari, Eugene Belilovsky• 2023

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy65.81
1362
Mathematical ReasoningMATH
Accuracy1.66
882
Code GeneratingMBPP
Pass@153.4
88
Safety AlignmentHarmBench
ASR33
88
Named Entity RecognitionMIT Movie
Entity F167.69
57
Relation ExtractionCONLL04
Relation Strict F143.62
52
Named Entity RecognitiontweetNER7
Entity F147.33
49
Safety AlignmentSORRY-Bench
ASR35.78
40
Relation ExtractionCoNLL 04
F140.2
39
Multilingual Mathematical ReasoningMSVAMP
Accuracy (English)36.7
33
Showing 10 of 18 rows

Other info

Follow for update