Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A Lost Opportunity for Vision-Language Models: A Comparative Study of Online Test-Time Adaptation for Vision-Language Models

About

In deep learning, maintaining model robustness against distribution shifts is critical. This work explores a broad range of possibilities to adapt vision-language foundation models at test-time, with a particular emphasis on CLIP and its variants. The study systematically examines prompt-based techniques and existing test-time adaptation methods, aiming to improve the robustness under distribution shift in diverse real-world scenarios. Specifically, the investigation covers various prompt engineering strategies, including handcrafted prompts, prompt ensembles, and prompt learning techniques. Additionally, we introduce a vision-text-space ensemble that substantially enhances average performance compared to text-space-only ensembles. Since online test-time adaptation has shown to be effective to mitigate performance drops under distribution shift, the study extends its scope to evaluate the effectiveness of existing test-time adaptation methods that were originally designed for vision-only classification models. Through extensive experimental evaluations conducted across multiple datasets and diverse model architectures, the research demonstrates the effectiveness of these adaptation strategies. Code is available at: https://github.com/mariodoebler/test-time-adaptation

Mario D\"obler, Robert A. Marsden, Tobias Raichle, Bin Yang• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-10C Severity Level 5 (test)
Average Error Rate (Severity 5)64.16
62
Image ClassificationCIFAR-100-C v1 (test)
Error Rate (Average)33.14
60
Image ClassificationImageNet-C 1.0 (test)--
53
Image ClassificationCIFAR-100C Level 5 (test)
Gaussian Acc17.97
45
Image ClassificationCIFAR-100-C
Accuracy (Corruption)48.53
44
Image ClassificationImageNet-C Severity 5 (test)
Error Rate (Gaussian)9.18
42
Image ClassificationCIFAR-10-C v1 (test)--
28
Image ClassificationImageNet-C
Gaussian Blur Error Rate26.36
13
Image ClassificationCIFAR10-C
Acc (Gauss)63.9
13
Showing 9 of 9 rows

Other info

Follow for update