Surgical Fine-Tuning Improves Adaptation to Distribution Shifts
About
A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task. This paper shows that in such settings, selectively fine-tuning a subset of layers (which we term surgical fine-tuning) matches or outperforms commonly used fine-tuning approaches. Moreover, the type of distribution shift influences which subset is more effective to tune: for example, for image corruptions, fine-tuning only the first few layers works best. We validate our findings systematically across seven real-world data tasks spanning three types of distribution shifts. Theoretically, we prove that for two-layer neural networks in an idealized setting, first-layer tuning can outperform fine-tuning all layers. Intuitively, fine-tuning more parameters on a small target dataset can cause information learned during pre-training to be forgotten, and the relevant information depends on the type of shift.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 (test) | -- | 1541 | |
| Image Classification | DTD | Accuracy65.7 | 419 | |
| Image Classification | SVHN | Accuracy96.08 | 359 | |
| Image Classification | PACS | -- | 230 | |
| Image Classification | FGVCAircraft | Accuracy57.94 | 225 | |
| Image Classification | Digits-Five | Accuracy (Source: mt)97.35 | 44 | |
| Semantic segmentation | Cityscapes to ACDC (test) | mIoU59.4 | 38 | |
| Image Classification | STL-10 | Accuracy96.92 | 33 | |
| Visual Question Answering | Ultra-MedVQA Task 4 | Accuracy62.16 | 26 | |
| Visual Question Answering | Ultra-MedVQA Task 5 | Accuracy70.23 | 26 |