Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Rethinking Weight Decay for Robust Fine-Tuning of Foundation Models

About

Modern optimizers such as AdamW, equipped with momentum and adaptive learning rate, are designed to escape local minima and explore the vast parameter space. This exploration is beneficial for finding good loss basins when training from scratch. It is not necessarily ideal when resuming from a powerful foundation model because it can lead to large deviations from the pre-trained initialization and, consequently, worse robustness and generalization. At the same time, strong regularization on all parameters can lead to under-fitting. We hypothesize that selectively regularizing the parameter space is the key to fitting and retraining the pre-trained knowledge. This paper proposes a new weight decay technique, Selective Projection Decay (SPD), that selectively imposes a strong penalty on certain layers while allowing others to change freely. Intuitively, SPD expands and contracts the parameter search space for layers with consistent and inconsistent loss reduction, respectively. Experimentally, when equipped with SPD, Adam consistently provides better in-distribution generalization and out-of-distribution robustness performance on multiple popular vision and language benchmarks. Code available at~\url{https://github.com/GT-RIPL/Selective-Projection-Decay.git}

Junjiao Tian, Chengyue Huang, Zsolt Kira• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationDomainNet (test)
Average Accuracy45.93
209
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA) (test)
BoolQ Accuracy72.9
138
Image ClassificationImageNet Robustness Variants (Adversarial, Rendition, Sketch) V2 (test)
Accuracy (ID)84.21
10
Semantic segmentationPascal Semantic Segmentation ID Clean (test)
mIoU (Clean)74.27
9
Semantic segmentationPascal Semantic Segmentation OOD Corrupted (test)
mIoU (Fog)0.7174
9
Visual Question AnsweringVQA v2 (ID)
Accuracy87.39
6
Visual Question AnsweringIV-VQA (Near OOD)
Accuracy0.9525
6
Visual Question AnsweringVQA-Rephrasings (Near OOD)
Accuracy79.48
6
Visual Question AnsweringVQA-CP Near OOD v2
Accuracy87.27
6
Visual Question AnsweringVQA-CE Near OOD
Accuracy73.52
6
Showing 10 of 15 rows

Other info

Code

Follow for update