Read-only Prompt Optimization for Vision-Language Few-shot Learning

About

In recent years, prompt tuning has proven effective in adapting pre-trained vision-language models to downstream tasks. These methods aim to adapt the pre-trained models by introducing learnable prompts while keeping pre-trained weights frozen. However, learnable prompts can affect the internal representation within the self-attention module, which may negatively impact performance variance and generalization, especially in data-deficient settings. To address these issues, we propose a novel approach, Read-only Prompt Optimization (RPO). RPO leverages masked attention to prevent the internal representation shift in the pre-trained model. Further, to facilitate the optimization of RPO, the read-only prompts are initialized based on special tokens of the pre-trained model. Our extensive experiments demonstrate that RPO outperforms CLIP and CoCoOp in base-to-new generalization and domain generalization while displaying better robustness. Also, the proposed method achieves better generalization on extremely data-deficient settings, while improving parameter efficiency and computational overhead. Code is available at https://github.com/mlvlab/RPO.

Dongjun Lee, Seokwon Song, Jihee Suh, Joonmyung Choi, Sanghyeok Lee, Hyunwoo J.Kim• 2023

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet V2	--	749
Image Classification	ImageNet-R	Top-1 Acc76.47	581
Image Classification	Flowers102	--	558
Image Classification	Food101	--	457
Image Classification	StanfordCars	--	384
Image Classification	ImageNet	Top-1 Accuracy71.67	366
Image Classification	FGVC-Aircraft (test)	Accuracy37.33	322
Image Classification	FGVCAircraft	--	289
Image Classification	Caltech101 (test)	--	204
Image Classification	OxfordPets	H Score96.05	182

Showing 10 of 62 rows

Other info

Code

Follow for update

@wizwand_team Discord