Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models

About

Large pre-trained vision-language (VL) models can learn a new task with a handful of examples and generalize to a new task without fine-tuning. However, these VL models are hard to deploy for real-world applications due to their impractically huge sizes and slow inference speed. To solve this limitation, we study prompt-based low-resource learning of VL tasks with our proposed method, FewVLM, relatively smaller than recent few-shot learners. For FewVLM, we pre-train a sequence-to-sequence transformer model with prefix language modeling (PrefixLM) and masked language modeling (MaskedLM). Furthermore, we analyze the effect of diverse prompts for few-shot tasks. Experimental results on VQA show that FewVLM with prompt-based learning outperforms Frozen which is 31x larger than FewVLM by 18.2% point and achieves comparable results to a 246x larger model, PICa. In our analysis, we observe that (1) prompts significantly affect zero-shot performance but marginally affect few-shot performance, (2) models with noisy prompts learn as quickly as hand-crafted prompts given larger training data, and (3) MaskedLM helps VQA tasks while PrefixLM boosts captioning performance. Our code is publicly available at \url{https://github.com/woojeongjin/FewVLM}

Woojeong Jin, Yu Cheng, Yelong Shen, Weizhu Chen, Xiang Ren• 2021

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVQA v2
Accuracy51.1
1362
Visual Question AnsweringGQA
Accuracy35.7
1249
Visual Question AnsweringVQA v2 (test-dev)
Overall Accuracy47.7
706
Visual Question AnsweringOK-VQA (test)
Accuracy23.1
327
5-way ClassificationminiImageNet (test)--
231
Visual Question AnsweringGQA (test-dev)
Accuracy29.3
184
Visual Question AnsweringVQAv2
Accuracy47.7
177
Visual Question AnsweringVQA v2 (val)
Accuracy47.7
144
Visual Question AnsweringVQA 2.0 (val)
Accuracy (Overall)51.1
143
Image Captioningnocaps (val)--
115
Showing 10 of 19 rows

Other info

Code

Follow for update