Plug-and-play Class-aware Knowledge Injection for Prompt Learning with Visual-Language Model

About

Prompt learning has become an effective and widely used technique in enhancing vision-language models (VLMs) such as CLIP for various downstream tasks, particularly in zero-shot classification within specific domains. Existing methods typically focus on either learning class-shared prompts for a given domain or generating instance-specific prompts through conditional prompt learning. While these methods have achieved promising performance, they often overlook class-specific knowledge in prompt design, leading to suboptimal outcomes. The underlying reasons are: 1) class-specific prompts offer more fine-grained supervision compared to coarse class-shared prompts, which helps prevent misclassification of data from different classes into a single class; 2) compared to class-specific prompts, instance-specific prompts neglect the richer class-level information across multiple instances, potentially causing data from the same class to be divided into multiple classes. To effectively supplement the class-specific knowledge into existing methods, we propose a plug-and-play Class-Aware Knowledge Injection (CAKI) framework. CAKI comprises two key components, i.e., class-specific prompt generation and query-key prompt matching. The former encodes class-specific knowledge into prompts from few-shot samples that belong to the same class and stores the learned prompts in a class-level knowledge bank. The latter provides a plug-and-play mechanism for each test instance to retrieve relevant class-level knowledge from the knowledge bank and inject such knowledge to refine model predictions. Extensive experiments demonstrate that our CAKI effectively improves the performance of existing methods on base and novel classes. Code is publicly available at \href{https://github.com/yjh576/CAKI}{this https URL}.

Junhui Yin, Nan Pu, Xinyu Zhang, Lingfeng Yang, Lin Wu, Xiaojie Wang, Zhun Zhong• 2026

Related benchmarks

Task	Dataset	Result
Classification	Cars	Accuracy90.6	571
Image Classification	UCF101	Top-1 Acc87.3	529
Semantic segmentation	Cityscapes	mIoU58.6	526
Image Classification	Pets	Accuracy95.2	320
Image Classification	Food101	Accuracy87.8	177
Image Classification	UCF101	Base Classes Acc79.3	139
Object Detection	Cityscapes	--	136
Image Classification	SUN397	Accuracy78.4	116
Semantic segmentation	Mapillary	mIoU57.5	112
Image Classification	Aircraft	Accuracy59.1	78

Showing 10 of 45 rows

Other info

Follow for update

@wizwand_team Discord