Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MetaTPT: Meta Test-time Prompt Tuning for Vision-Language Models

About

Vision-language models (VLMs) such as CLIP exhibit strong zero-shot generalization but remain sensitive to domain shifts at test time. Test-time prompt tuning (TPT) mitigates this issue by adapting prompts with fixed augmentations, which may falter in more challenging settings. In this work, we propose Meta Test-Time Prompt Tuning (MetaTPT), a meta-learning framework that learns a self-supervised auxiliary task to guide test-time prompt tuning. The auxiliary task dynamically learns parameterized augmentations for each sample, enabling more expressive transformations that capture essential features in target domains. MetaTPT adopts a dual-loop optimization paradigm: an inner loop learns a self-supervised task that generates informative views, while the outer loop performs prompt tuning by enforcing consistency across these views. By coupling augmentation learning with prompt tuning, MetaTPT improves test-time adaptation under domain shifts. Extensive experiments demonstrate that MetaTPT achieves state-of-the-art performance on domain generalization and cross-dataset benchmarks.

Yuqing Lei, Yingjun Du, Yawen Huang, Xiantong Zhen, Ling Shao• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationDTD
Accuracy48.88
485
Image ClassificationFood101
Accuracy87.61
457
Image ClassificationUCF101
Top-1 Acc72.24
455
Image ClassificationSUN397
Accuracy69.17
441
Image ClassificationStanfordCars
Accuracy69.5
312
Image ClassificationFGVCAircraft
Accuracy29.05
261
Image ClassificationCaltech101
Accuracy94.9
228
Image ClassificationEuroSAT
Accuracy54.26
207
Image ClassificationOxfordPets
Accuracy92.79
160
Image ClassificationOxford Flowers
Top-1 Accuracy74.22
83
Showing 10 of 13 rows

Other info

Follow for update