Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models

About

Soft prompt learning has recently emerged as one of the methods of choice for adapting V&L models to a downstream task using a few training examples. However, current methods significantly overfit the training data, suffering from large accuracy degradation when tested on unseen classes from the same domain. To this end, in this paper, we make the following 4 contributions: (1) To alleviate base class overfitting, we propose a novel Language-Aware Soft Prompting (LASP) learning method by means of a text-to-text cross-entropy loss that maximizes the probability of the learned prompts to be correctly classified with respect to pre-defined hand-crafted textual prompts. (2) To increase the representation capacity of the prompts, we propose grouped LASP where each group of prompts is optimized with respect to a separate subset of textual prompts. (3) We identify a visual-language misalignment introduced by prompt learning and LASP, and more importantly, propose a re-calibration mechanism to address it. (4) We show that LASP is inherently amenable to including, during training, virtual classes, i.e. class names for which no visual samples are available, further increasing the robustness of the learned prompts. Through evaluations on 11 datasets, we show that our approach (a) significantly outperforms all prior works on soft prompting, and (b) matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets. Code will be made available at https://www.adrianbulat.com/lasp

Adrian Bulat, Georgios Tzimiropoulos• 2022

Related benchmarks

TaskDatasetResultRank
Image ClassificationFood101--
457
Image ClassificationStanfordCars--
312
Domain GeneralizationVLCS
Accuracy87.25
238
Domain GeneralizationPACS
Accuracy97.02
231
Image ClassificationCaltech101
Base Accuracy98.17
148
Image ClassificationOxfordPets
Base Accuracy95.73
137
Image ClassificationEuroSAT
Base Accuracy95
104
Base-to-New GeneralizationAvg over 11 datasets
Base Score83.18
90
Image Classification11 datasets base-to-new average
Base Average Score83.18
81
Action RecognitionUCF101
Base Accuracy85.53
75
Showing 10 of 38 rows

Other info

Follow for update