Efficient Test-Time Adaptation of Vision-Language Models

About

Test-time adaptation with pre-trained vision-language models has attracted increasing attention for tackling distribution shifts during the test time. Though prior studies have achieved very promising performance, they involve intensive computation which is severely unaligned with test-time adaptation. We design TDA, a training-free dynamic adapter that enables effective and efficient test-time adaptation with vision-language models. TDA works with a lightweight key-value cache that maintains a dynamic queue with few-shot pseudo labels as values and the corresponding test-sample features as keys. Leveraging the key-value cache, TDA allows adapting to test data gradually via progressive pseudo label refinement which is super-efficient without incurring any backpropagation. In addition, we introduce negative pseudo labeling that alleviates the adverse impact of pseudo label noises by assigning pseudo labels to certain negative classes when the model is uncertain about its pseudo label predictions. Extensive experiments over two benchmarks demonstrate TDA's superior effectiveness and efficiency as compared with the state-of-the-art. The code has been released in \url{https://kdiaaa.github.io/tda/}.

Adilbek Karmanov, Dayan Guan, Shijian Lu, Abdulmotaleb El Saddik, Eric Xing• 2024

Related benchmarks

Task	Dataset	Result
Semantic segmentation	Cityscapes (test)	mIoU42.6	1252
Image Classification	CIFAR-10	Accuracy91.73	875
Image Classification	ImageNet V2	--	749
Image Classification	ImageNet A	Top-1 Acc61.27	698
Image Classification	DTD	Accuracy47.4	599
Image Classification	Flowers102	Accuracy71.4	558
Image Classification	Food101	Accuracy86.1	457
Image Classification	SUN397	Accuracy67.6	450
Image Classification	StanfordCars	Accuracy67.3	384
Image Classification	Aircraft	Accuracy23.9	340

Showing 10 of 99 rows

...

Other info

Follow for update

@wizwand_team Discord