Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ZipIt! Merging Models from Different Tasks without Training

About

Typical deep visual recognition models are capable of performing the one task they were trained on. In this paper, we tackle the extremely difficult problem of combining distinct models with different initializations, each solving a separate task, into one multi-task model without any additional training. Prior work in model merging permutes one model to the space of the other then averages them together. While this works for models trained on the same task, we find that this fails to account for the differences in models trained on disjoint tasks. Thus, we introduce "ZipIt!", a general method for merging two arbitrary models of the same architecture that incorporates two simple strategies. First, in order to account for features that aren't shared between models, we expand the model merging problem to allow for merging features within each model by defining a general "zip" operation. Second, we add support for partially zipping the models up until a specified layer, naturally creating a multi-head model. We find that these two changes combined account for 20-60% improvement over prior work, making it more feasible to merge models trained on disjoint tasks without retraining.

George Stoica, Daniel Bolya, Jakob Bjorner, Pratik Ramesh, Taylor Hearn, Judy Hoffman• 2023

Related benchmarks

TaskDatasetResultRank
Image ClassificationCUB-200 2011
Accuracy66.1
356
Image ClassificationOxford-IIIT Pets
Accuracy86.1
306
Image ClassificationDomainNet (test)--
219
Image ClassificationStanford Dogs
Accuracy60.6
153
Image ClassificationNABirds
Accuracy8
37
Image ClassificationCIFAR-100 50+50
Joint Accuracy72.8
25
Image ClassificationCIFAR-100 Task A 50 classes
Accuracy79.9
16
Image ClassificationCIFAR100 50+50
Joint Accuracy54.69
14
Image ClassificationCIFAR-100 50+50 (Joint)
Accuracy63.39
12
Image ClassificationCIFAR-100 Task B 50 classes
Accuracy74.24
12
Showing 10 of 16 rows

Other info

Follow for update