Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Plug-and-play Class-aware Knowledge Injection for Prompt Learning with Visual-Language Model

About

Prompt learning has become an effective and widely used technique in enhancing vision-language models (VLMs) such as CLIP for various downstream tasks, particularly in zero-shot classification within specific domains. Existing methods typically focus on either learning class-shared prompts for a given domain or generating instance-specific prompts through conditional prompt learning. While these methods have achieved promising performance, they often overlook class-specific knowledge in prompt design, leading to suboptimal outcomes. The underlying reasons are: 1) class-specific prompts offer more fine-grained supervision compared to coarse class-shared prompts, which helps prevent misclassification of data from different classes into a single class; 2) compared to class-specific prompts, instance-specific prompts neglect the richer class-level information across multiple instances, potentially causing data from the same class to be divided into multiple classes. To effectively supplement the class-specific knowledge into existing methods, we propose a plug-and-play Class-Aware Knowledge Injection (CAKI) framework. CAKI comprises two key components, i.e., class-specific prompt generation and query-key prompt matching. The former encodes class-specific knowledge into prompts from few-shot samples that belong to the same class and stores the learned prompts in a class-level knowledge bank. The latter provides a plug-and-play mechanism for each test instance to retrieve relevant class-level knowledge from the knowledge bank and inject such knowledge to refine model predictions. Extensive experiments demonstrate that our CAKI effectively improves the performance of existing methods on base and novel classes. Code is publicly available at \href{https://github.com/yjh576/CAKI}{this https URL}.

Junhui Yin, Nan Pu, Xinyu Zhang, Lingfeng Yang, Lin Wu, Xiaojie Wang, Zhun Zhong• 2026

Related benchmarks

TaskDatasetResultRank
Image ClassificationUCF101
Top-1 Acc87.3
527
Semantic segmentationCityscapes
mIoU58.6
494
ClassificationCars
Accuracy90.6
492
Image ClassificationPets
Accuracy95.2
308
Image ClassificationFood101
Accuracy87.8
177
Image ClassificationUCF101
Base Classes Acc79.3
139
Object DetectionCityscapes--
136
Image ClassificationSUN397
Accuracy78.4
116
Semantic segmentationMapillary
mIoU57.5
85
Image ClassificationFood101
Base Accuracy91.8
69
Showing 10 of 45 rows

Other info

Follow for update