Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

$\texttt{BATCLIP}$: Bimodal Online Test-Time Adaptation for CLIP

About

Although open-vocabulary classification models like Contrastive Language Image Pretraining (CLIP) have demonstrated strong zero-shot learning capabilities, their robustness to common image corruptions remains poorly understood. Through extensive experiments, we show that zero-shot CLIP lacks robustness to common image corruptions during test-time, necessitating the adaptation of CLIP to unlabeled corrupted images using test-time adaptation (TTA). However, we found that existing TTA methods have severe limitations in adapting CLIP due to their unimodal nature. To address these limitations, we propose $\texttt{BATCLIP}$, a bimodal $\textbf{online}$ TTA method designed to improve CLIP's robustness to common image corruptions. The key insight of our approach is not only to adapt the visual encoders for improving image features but also to strengthen the alignment between image and text features by promoting a stronger association between the image class prototype, computed using pseudo-labels, and the corresponding text feature. We evaluate our approach on benchmark image corruption datasets and achieve state-of-the-art results in online TTA for CLIP. Furthermore, we evaluate our proposed TTA approach on various domain generalization datasets to demonstrate its generalization capabilities. Our code is available at https://github.com/sarthaxxxxx/BATCLIP

Sarthak Kumar Maharana, Baoming Zhang, Leonid Karlinsky, Rogerio Feris, Yunhui Guo• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-10C Severity Level 5 (test)
Average Error Rate (Severity 5)73.79
62
Image ClassificationCIFAR-100-C v1 (test)
Error Rate (Average)37.68
60
Image ClassificationImageNet-C 1.0 (test)--
53
Image ClassificationCIFAR-100C Level 5 (test)
Gaussian Acc25.52
45
Image ClassificationCIFAR-100-C
Accuracy (Corruption)50.89
44
Image ClassificationImageNet-C Severity 5 (test)
Error Rate (Gaussian)19.48
42
Image ClassificationCIFAR-10-C v1 (test)--
28
Image ClassificationCIFAR10-C
Acc (Gauss)74.81
13
Image ClassificationImageNet-C
Gaussian Blur Error Rate31.44
13
Showing 9 of 9 rows

Other info

Follow for update