Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models

About

Test-time adaptation, which enables models to generalize to diverse data with unlabeled test samples, holds significant value in real-world scenarios. Recently, researchers have applied this setting to advanced pre-trained vision-language models (VLMs), developing approaches such as test-time prompt tuning to further extend their practical applicability. However, these methods typically focus solely on adapting VLMs from a single modality and fail to accumulate task-specific knowledge as more samples are processed. To address this, we introduce Dual Prototype Evolving (DPE), a novel test-time adaptation approach for VLMs that effectively accumulates task-specific knowledge from multi-modalities. Specifically, we create and evolve two sets of prototypes--textual and visual--to progressively capture more accurate multi-modal representations for target classes during test time. Moreover, to promote consistent multi-modal representations, we introduce and optimize learnable residuals for each test sample to align the prototypes from both modalities. Extensive experimental results on 15 benchmark datasets demonstrate that our proposed DPE consistently outperforms previous state-of-the-art methods while also exhibiting competitive computational efficiency. Code is available at https://github.com/zhangce01/DPE-CLIP.

Ce Zhang, Simon Stepputtis, Katia Sycara, Yaqi Xie• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationFlowers102
Accuracy75.1
558
Image ClassificationDTD
Accuracy54.2
542
Image ClassificationFood101
Accuracy86.2
457
Image ClassificationSUN397
Accuracy70.1
441
Image ClassificationAircraft
Accuracy29
333
Image ClassificationStanfordCars
Accuracy67.3
312
Fine-grained visual classificationFGVC-Aircraft (test)
Top-1 Acc28.95
312
Image ClassificationPets
Accuracy91.1
245
Image ClassificationCaltech101
Accuracy94.8
228
Image ClassificationImageNet V2 (test)
Top-1 Accuracy65.44
216
Showing 10 of 53 rows

Other info

Code

Follow for update