Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Surgical Fine-Tuning Improves Adaptation to Distribution Shifts

About

A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task. This paper shows that in such settings, selectively fine-tuning a subset of layers (which we term surgical fine-tuning) matches or outperforms commonly used fine-tuning approaches. Moreover, the type of distribution shift influences which subset is more effective to tune: for example, for image corruptions, fine-tuning only the first few layers works best. We validate our findings systematically across seven real-world data tasks spanning three types of distribution shifts. Theoretically, we prove that for two-layer neural networks in an idealized setting, first-layer tuning can outperform fine-tuning all layers. Intuitively, fine-tuning more parameters on a small target dataset can cause information learned during pre-training to be forgotten, and the relevant information depends on the type of shift.

Yoonho Lee, Annie S. Chen, Fahim Tajwar, Ananya Kumar, Huaxiu Yao, Percy Liang, Chelsea Finn• 2022

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2 (test)--
1541
Image ClassificationDTD
Accuracy65.7
419
Image ClassificationSVHN
Accuracy96.08
359
Image ClassificationPACS--
230
Image ClassificationFGVCAircraft
Accuracy57.94
225
Image ClassificationDigits-Five
Accuracy (Source: mt)97.35
44
Semantic segmentationCityscapes to ACDC (test)
mIoU59.4
38
Image ClassificationSTL-10
Accuracy96.92
33
Visual Question AnsweringUltra-MedVQA Task 4
Accuracy62.16
26
Visual Question AnsweringUltra-MedVQA Task 5
Accuracy70.23
26
Showing 10 of 20 rows

Other info

Follow for update