Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views

About

Fine-tuning is becoming widely used for leveraging the power of pre-trained foundation models in new downstream tasks. While there are many successes of fine-tuning on various tasks, recent studies have observed challenges in the generalization of fine-tuned models to unseen distributions (i.e., out-of-distribution; OOD). To improve OOD generalization, some previous studies identify the limitations of fine-tuning data and regulate fine-tuning to preserve the general representation learned from pre-training data. However, potential limitations in the pre-training data and models are often ignored. In this paper, we contend that overly relying on the pre-trained representation may hinder fine-tuning from learning essential representations for downstream tasks and thus hurt its OOD generalization. It can be especially catastrophic when new tasks are from different (sub)domains compared to pre-training data. To address the issues in both pre-training and fine-tuning data, we propose a novel generalizable fine-tuning method LEVI (Layer-wise Ensemble of different VIews), where the pre-trained model is adaptively ensembled layer-wise with a small task-specific model, while preserving its efficiencies. By combining two complementing models, LEVI effectively suppresses problematic features in both the fine-tuning data and pre-trained model and preserves useful features for new tasks. Broad experiments with large language and vision models show that LEVI greatly improves fine-tuning generalization via emphasizing different views from fine-tuning data and pre-trained features.

Yuji Roh, Qingyun Liu, Huan Gui, Zhe Yuan, Yujin Tang, Steven Euijong Whang, Liang Liu, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao• 2024

Related benchmarks

TaskDatasetResultRank
Object Hallucination EvaluationPOPE--
935
Image ClassificationFlowers102
Accuracy3.7
478
Image ClassificationFood101
Accuracy20.8
309
Multimodal Model EvaluationMMBench
Accuracy64
180
Image ClassificationCaltech101
Accuracy43.5
162
Multimodal Model EvaluationMME
Total Score1.75e+3
63
Scientific Question AnsweringScienceQA image
Accuracy69.4
53
Visual PerceptionMMVP
Accuracy61.3
47
Vision-centric EvaluationCV-Bench
Accuracy0.474
21
Visual Question AnsweringTextVQA
Accuracy49.2
7
Showing 10 of 10 rows

Other info

Follow for update