Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MetaTPT: Meta Test-time Prompt Tuning for Vision-Language Models

About

Vision-language models (VLMs) such as CLIP exhibit strong zero-shot generalization but remain sensitive to domain shifts at test time. Test-time prompt tuning (TPT) mitigates this issue by adapting prompts with fixed augmentations, which may falter in more challenging settings. In this work, we propose Meta Test-Time Prompt Tuning (MetaTPT), a meta-learning framework that learns a self-supervised auxiliary task to guide test-time prompt tuning. The auxiliary task dynamically learns parameterized augmentations for each sample, enabling more expressive transformations that capture essential features in target domains. MetaTPT adopts a dual-loop optimization paradigm: an inner loop learns a self-supervised task that generates informative views, while the outer loop performs prompt tuning by enforcing consistency across these views. By coupling augmentation learning with prompt tuning, MetaTPT improves test-time adaptation under domain shifts. Extensive experiments demonstrate that MetaTPT achieves state-of-the-art performance on domain generalization and cross-dataset benchmarks.

Yuqing Lei, Yingjun Du, Yawen Huang, Xiantong Zhen, Ling Shao• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationUCF101
Top-1 Acc72.24
527
Image ClassificationDTD
Accuracy48.88
487
Image ClassificationFood101
Accuracy87.61
457
Image ClassificationSUN397
Accuracy69.17
450
Image ClassificationStanfordCars
Accuracy69.5
384
Image ClassificationOxfordPets
Accuracy92.79
298
Image ClassificationFGVCAircraft
Accuracy29.05
289
Image ClassificationCaltech101
Accuracy94.9
228
Image ClassificationEuroSAT
Accuracy54.26
226
Image ClassificationOxford Flowers
Top-1 Accuracy74.22
83
Showing 10 of 13 rows

Other info

Follow for update