Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection

About

Recent vision-language pre-trained models (VL-PTMs) have shown remarkable success in open-vocabulary tasks. However, downstream use cases often involve further fine-tuning of VL-PTMs, which may distort their general knowledge and impair their ability to handle distribution shifts. In real-world scenarios, machine learning systems inevitably encounter both covariate shifts (e.g., changes in image styles) and semantic shifts (e.g., test-time unseen classes). This highlights the importance of enhancing out-of-distribution (OOD) generalization on covariate shifts and simultaneously detecting semantic-shifted unseen classes. Thus a critical but underexplored question arises: How to improve VL-PTMs' generalization ability to closed-set OOD data, while effectively detecting open-set unseen classes during fine-tuning? In this paper, we propose a novel objective function of OOD detection that also serves to improve OOD generalization. We show that minimizing the gradient magnitude of energy scores on training data leads to domain-consistent Hessians of classification loss, a strong indicator for OOD generalization revealed by theoretical analysis. Based on this finding, we have developed a unified fine-tuning framework that allows for concurrent optimization of both tasks. Extensive experiments have demonstrated the superiority of our method. The code is available at https://github.com/LinLLLL/CRoFT.

Lin Zhu, Yifeng Yang, Qinying Gu, Xinbing Wang, Chenghu Zhou, Nanyang Ye• 2024

Related benchmarks

TaskDatasetResultRank
Domain GeneralizationPACS OOD (test)
Average Accuracy97.3
31
Out-of-Distribution DetectionVLCS Open-Set (DTD, Food101, Caltech101)
AUC (DTD)86.7
28
Out-of-Distribution DetectionPACS Open-Set DTD Food101 Caltech101
DTD AUC94.7
28
Image ClassificationClassification Suite (OxfordPets, EuroSAT, Caltech101, DTD, FGVCAircraft, Flowers102, UCF101, Food101, SUN397, StanfordCars, Imagenet) Few-shot CLIP RN50 pre-trained (test)
OxfordPets Accuracy89.97
26
Open-Set OOD DetectionImageNet Setup-I 1.0 (test)
AUROC87.2
24
OOD GeneralizationImageNet Setup-I 1.0 (test)
ID Accuracy83.1
24
OOD GeneralizationVLCS
OOD Accuracy80.2
18
Showing 7 of 7 rows

Other info

Follow for update