Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Realistic Test-Time Adaptation of Vision-Language Models

About

The zero-shot capabilities of Vision-Language Models (VLMs) have been widely leveraged to improve predictive performance. However, previous works on transductive or test-time adaptation (TTA) often make strong assumptions about the data distribution, such as the presence of all classes. Our work challenges these favorable deployment scenarios, and introduces a more realistic evaluation framework, including: (i) a variable number of effective classes for adaptation within a single batch, and (ii) non-i.i.d. batches of test samples in online adaptation settings. We provide comprehensive evaluations, comparisons, and ablation studies that demonstrate how current transductive or TTA methods for VLMs systematically compromise the models' initial zero-shot robustness across various realistic scenarios, favoring performance gains under advantageous assumptions about the test samples' distributions. Furthermore, we introduce StatA, a versatile method that could handle a wide range of deployment scenarios, including those with a variable number of effective classes at test time. Our approach incorporates a novel regularization term designed specifically for VLMs, which acts as a statistical anchor preserving the initial text-encoder knowledge, particularly in low-data regimes. Code available at https://github.com/MaxZanella/StatA.

Maxime Zanella, Cl\'ement Fuchs, Christophe De Vleeschouwer, Ismail Ben Ayed• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationEuroSAT
Accuracy76.7
497
Image ClassificationDTD
Accuracy59.7
487
Image ClassificationFlowers102
Accuracy82.9
478
Image ClassificationStanford Cars
Accuracy71.9
477
Image ClassificationImageNet--
429
Image ClassificationSUN397
Accuracy72.8
425
Image ClassificationUCF101
Top-1 Acc81.3
404
Image ClassificationFood101
Accuracy94.2
309
Image ClassificationAircraft
Accuracy41.3
302
Image ClassificationStanfordCars
Accuracy84.4
266
Showing 10 of 20 rows

Other info

Code

Follow for update