Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

About

The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks. Previous works have shown test-time prompt tuning using entropy minimization to adapt text prompts for unseen domains. While effective, this overlooks the key cause for performance degradation to unseen domains -- distribution shift. In this work, we explicitly handle this problem by aligning the out-of-distribution (OOD) test sample statistics to those of the source data using prompt tuning. We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain. Evaluating against the domain generalization benchmark, our method improves zero-shot top- 1 accuracy beyond existing prompt-learning techniques, with a 3.08% improvement over the baseline MaPLe. In cross-dataset generalization with unseen categories across 10 datasets, our method improves consistently across all datasets compared to the existing state-of-the-art. Our source code and models are available at https://jameelhassan.github.io/promptalign.

Jameel Hassan, Hanan Gani, Noor Hussein, Muhammad Uzair Khattak, Muzammal Naseer, Fahad Shahbaz Khan, Salman Khan• 2023

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet-1K
Top-1 Acc71.44
524
Image ClassificationEuroSAT--
497
Image ClassificationFood-101--
494
Image ClassificationDTD--
487
Image ClassificationStanford Cars--
477
Image ClassificationSUN397--
425
Image ClassificationDTD
Accuracy47.24
419
Image ClassificationUCF101
Top-1 Acc84.11
404
Image ClassificationFood101
Accuracy86.65
309
Image ClassificationStanfordCars
Accuracy68.5
266
Showing 10 of 70 rows

Other info

Code

Follow for update