Visual-Language Prompt Tuning with Knowledge-guided Context Optimization

About

Prompt tuning is an effective way to adapt the pre-trained visual-language model (VLM) to the downstream task using task-related textual tokens. Representative CoOp-based work combines the learnable textual tokens with the class tokens to obtain specific textual knowledge. However, the specific textual knowledge is the worse generalization to the unseen classes because it forgets the essential general textual knowledge having a strong generalization ability. To tackle this issue, we introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes. The key insight of KgCoOp is that forgetting about essential knowledge can be alleviated by reducing the discrepancy between the learnable prompt and the hand-crafted prompt. Especially, KgCoOp minimizes the discrepancy between the textual embeddings generated by learned prompts and the hand-crafted prompts. Finally, adding the KgCoOp upon the contrastive loss can make a discriminative prompt for both seen and unseen tasks. Extensive evaluation of several benchmarks demonstrates that the proposed Knowledge-guided Context Optimization is an efficient method for prompt tuning, \emph{i.e.,} achieves better performance with less training time.

Hantao Yao, Rui Zhang, Changsheng Xu• 2023

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet V2	--	749
Image Classification	Stanford Cars	Accuracy74.8	660
Image Classification	DTD	Accuracy46.35	599
Image Classification	ImageNet-R	Top-1 Acc76.7	581
Image Classification	Food-101	Accuracy86.36	570
Image Classification	EuroSAT	Accuracy46.04	569
Image Classification	Flowers102	Accuracy95.62	558
Image Classification	UCF101	Top-1 Acc83.72	527
Classification	Cars	Accuracy74.8	492
Image Classification	DTD	Accuracy69.52	487

Showing 10 of 359 rows

...

Other info

Follow for update

@wizwand_team Discord