Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

About

Fine-tuning pre-trained models for downstream tasks is a widely adopted technique known for its adaptability and reliability across various domains. Despite its conceptual simplicity, fine-tuning entails several troublesome engineering choices, such as selecting hyperparameters and determining checkpoints from an optimization trajectory. To tackle the difficulty of choosing the best model, one effective solution is model fusion, which combines multiple models in a parameter space. However, we observe a large discrepancy between loss and metric landscapes during the fine-tuning of pre-trained language models. Building on this observation, we introduce a novel model fusion technique that optimizes both the desired metric and loss through multi-objective Bayesian optimization. In addition, to effectively select hyperparameters, we establish a two-stage procedure by integrating Bayesian optimization processes into our framework. Experiments across various downstream tasks show considerable performance improvements using our Bayesian optimization-guided method.

Chaeyun Jang, Hyungi Lee, Jungtaek Kim, Juho Lee• 2024

Related benchmarks

TaskDatasetResultRank
Question AnsweringSQuAD 2.0
F181.82
190
Question AnsweringSQuAD v2.0 (dev)
F181.82
158
Abstractive SummarizationSamSum
ROUGE-228.78
73
General Language UnderstandingGLUE v1 (test dev)
MNLI87.86
40
SummarizationSamSum (test)
ROUGE-153.4
18
Dialogue GenerationE2E
BLEU64.81
10
Multiple-choice Question AnsweringKorMedMCQA (test)
Accuracy (Doctor)45.31
7
SummarizationSummarization
Grade73.18
6
Showing 8 of 8 rows

Other info

Code

Follow for update