Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Context-Aware Robust Fine-Tuning

About

Contrastive Language-Image Pre-trained (CLIP) models have zero-shot ability of classifying an image belonging to "[CLASS]" by using similarity between the image and the prompt sentence "a [CONTEXT] of [CLASS]". Based on exhaustive text cues in "[CONTEXT]", CLIP model is aware of different contexts, e.g. background, style, viewpoint, and exhibits unprecedented robustness against a wide range of distribution shifts. However, recent works find further fine-tuning of CLIP models improves accuracy but sacrifices the robustness on downstream tasks. We conduct an empirical investigation to show fine-tuning will corrupt the context-aware ability of pre-trained CLIP features. To solve this problem, we propose Context-Aware Robust Fine-tuning (CAR-FT). CAR-FT regularizes the model during fine-tuning to capture the context information. Specifically, we use zero-shot prompt weights to get the context distribution contained in the image. By minimizing the Kullback-Leibler Divergence (KLD) between context distributions induced by original/fine-tuned CLIP models, CAR-FT makes the context-aware ability of CLIP inherited into downstream tasks, and achieves both higher In-Distribution (ID) and Out-Of-Distribution (OOD) accuracy. The experimental results show CAR-FT achieves superior robustness on five OOD test datasets of ImageNet, and meanwhile brings accuracy gains on nine downstream tasks. Additionally, CAR-FT surpasses previous Domain Generalization (DG) methods and gets 78.5% averaged accuracy on DomainBed benchmark, building the new state-of-the-art.

Xiaofeng Mao, Yuefeng Chen, Xiaojun Jia, Rong Zhang, Hui Xue, Zhao Li• 2022

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet V2 (test)
Top-1 Accuracy72.8
232
Image ClassificationCIFAR-100
Accuracy65.9
204
Image ClassificationImageNet-A (test)--
177
Image ClassificationImageNet-R (test)
Accuracy75.6
170
Image ClassificationImageNet-Sketch (test)--
153
Domain GeneralizationDomainBed
Average Accuracy78.5
127
Image ClassificationImageNet Rendition
Top-1 Accuracy75.37
113
Image ClassificationImageNet and Distribution Shifts
ImageNet-V2 Accuracy75.8
49
Image ClassificationDomainBed v1.0 (test)
Average Accuracy78.5
36
Image ClassificationImageNet and derived distribution shifts standard suite (test val)
IN Accuracy (ref.)86.3
32
Showing 10 of 19 rows

Other info

Follow for update