Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Dataless Knowledge Fusion by Merging Weights of Language Models

About

Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. Oftentimes fine-tuned models are readily available but their training data is not, due to data privacy or intellectual property concerns. This creates a barrier to fusing knowledge across individual models to yield a better single model. In this paper, we study the problem of merging individual models built on different training data sets to obtain a single model that performs well both across all data set domains and can generalize on out-of-domain data. We propose a dataless knowledge fusion method that merges models in their parameter space, guided by weights that minimize prediction differences between the merged model and the individual models. Over a battery of evaluation settings, we show that the proposed method significantly outperforms baselines such as Fisher-weighted averaging or model ensembling. Further, we find that our method is a promising alternative to multi-task learning that can preserve or sometimes improve over the individual models without access to the training data. Finally, model merging is more efficient than training a multi-task model, thus making it applicable to a wider set of scenarios.

Xisen Jin, Xiang Ren, Daniel Preotiuc-Pietro, Pengxiang Cheng• 2022

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100
Accuracy82.59
691
Image ClassificationStanford Cars
Accuracy70.8
635
Image ClassificationEuroSAT
Accuracy78.6
569
Image ClassificationFood-101
Accuracy76.14
542
Image ClassificationDTD
Accuracy30.53
542
Natural Language UnderstandingGLUE
SST-290.6
531
Image ClassificationDTD
Accuracy52
485
Natural Language InferenceRTE
Accuracy81.2
448
Image ClassificationSUN397
Accuracy69.5
441
Image ClassificationSUN397
Accuracy58.58
425
Showing 10 of 87 rows
...

Other info

Follow for update