Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Neutral-Reference Prompting for Vision-Language Models

About

Efficient transfer learning of vision-language models (VLMs) commonly suffers from a Base-New Trade-off (BNT): improving performance on unseen (new) classes often degrades accuracy on known (base) classes. Addressing how to boost recognition of unseen classes without sacrificing known-class performance remains a central challenge. Existing work often simplistically attributes the BNT to overfitting on known classes. We observe an interesting phenomenon: VLMs frequently exhibit asymmetric confusion on certain downstream data, i.e., samples of class A are systematically mispredicted as class B, while the reverse confusion (B to A) rarely occurs. For known classes, this kind of bias can be mitigated by tuning using a cross-entropy loss, but for unseen classes, such pretraining-induced bias persists and harms generalization. Motivated by this, we propose NeRP, a plug-and-play prompting correction strategy that improves discrimination on unseen classes without modifying model parameters. NeRP leverages neutral text prompts and reference images to measure class-wise prior preferences along the pre-trained inter-class geometry, and combines them with the sample likelihood to obtain the model's surrogate score. If, for a given sample, the prior strongly favors the current prediction while the observed evidence is clearly insufficient, we perform a local flip between easily confusable class pairs, thereby correcting prior-dominated mispredictions. Extensive experiments across multiple backbones and 15 few-shot and cross-domain benchmarks show that NeRP substantially improves accuracy on unseen classes while preserving known-class prediction performance.

Senmao Tian, Xiang Wei, Shunli Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Image ClassificationUCF101
Top-1 Acc70.43
527
Image ClassificationImageNet
Top-1 Accuracy72.03
343
Image ClassificationOxfordPets
Accuracy91.43
298
Image ClassificationFGVC Aircraft
Accuracy26.5
223
Image ClassificationOxfordPets
H Score96.79
182
Image ClassificationFood101
Accuracy86.5
177
Image ClassificationSUN397
Accuracy68.25
116
Image ClassificationStanford Cars
Top-1 Accuracy66.6
104
Image ClassificationAverage 11 datasets
Base Accuracy85.68
95
Image ClassificationEuroSAT Base-to-New
Base Score95.6
87
Showing 10 of 21 rows

Other info

Follow for update