Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MetaTPT: Meta Test-time Prompt Tuning for Vision-Language Models

About

Vision-language models (VLMs) such as CLIP exhibit strong zero-shot generalization but remain sensitive to domain shifts at test time. Test-time prompt tuning (TPT) mitigates this issue by adapting prompts with fixed augmentations, which may falter in more challenging settings. In this work, we propose Meta Test-Time Prompt Tuning (MetaTPT), a meta-learning framework that learns a self-supervised auxiliary task to guide test-time prompt tuning. The auxiliary task dynamically learns parameterized augmentations for each sample, enabling more expressive transformations that capture essential features in target domains. MetaTPT adopts a dual-loop optimization paradigm: an inner loop learns a self-supervised task that generates informative views, while the outer loop performs prompt tuning by enforcing consistency across these views. By coupling augmentation learning with prompt tuning, MetaTPT improves test-time adaptation under domain shifts. Extensive experiments demonstrate that MetaTPT achieves state-of-the-art performance on domain generalization and cross-dataset benchmarks.

Yuqing Lei, Yingjun Du, Yawen Huang, Xiantong Zhen, Ling Shao• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationDTD
Accuracy48.88
419
Image ClassificationUCF101
Top-1 Acc72.24
404
Image ClassificationFood101
Accuracy87.61
309
Image ClassificationStanfordCars
Accuracy69.5
266
Image ClassificationSUN397
Accuracy69.17
246
Image ClassificationFGVCAircraft
Accuracy29.05
225
Image ClassificationCaltech101
Accuracy94.9
162
Image ClassificationOxfordPets
Accuracy92.79
113
Image ClassificationEuroSAT
Accuracy54.26
83
Image ClassificationOxford Flowers
Top-1 Accuracy74.22
78
Showing 10 of 13 rows

Other info

Follow for update