Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Decoupling Template Bias in CLIP: Harnessing Empty Prompts for Enhanced Few-Shot Learning

About

The Contrastive Language-Image Pre-Training (CLIP) model excels in few-shot learning by aligning visual and textual representations. Our study shows that template-sample similarity (TSS), defined as the resemblance between a text template and an image sample, introduces bias. This bias leads the model to rely on template proximity rather than true sample-to-category alignment, reducing both accuracy and robustness in classification. We present a framework that uses empty prompts, textual inputs that convey the idea of "emptiness" without category information. These prompts capture unbiased template features and offset TSS bias. The framework employs two stages. During pre-training, empty prompts reveal and reduce template-induced bias within the CLIP encoder. During few-shot fine-tuning, a bias calibration loss enforces correct alignment between images and their categories, ensuring the model focuses on relevant visual cues. Experiments across multiple benchmarks demonstrate that our template correction method significantly reduces performance fluctuations caused by TSS, yielding higher classification accuracy and stronger robustness. The repository of this project is available at https://github.com/zhenyuZ-HUST/Decoupling-Template-Bias-in-CLIP.

Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Zhimeng Huang, Yuhua Li• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationEuroSAT--
497
Image ClassificationUCF101
Top-1 Acc85
404
Image ClassificationOxford-IIIT Pets
Accuracy94.7
259
Image ClassificationFGVC Aircraft
Top-1 Accuracy49.4
185
Image ClassificationCaltech-101
Top-1 Accuracy96
146
Image ClassificationFood--
92
Image ClassificationOxford 102 Flowers
Top-1 Accuracy96.9
68
Image ClassificationAircraft
Top-1 Acc57.4
43
Image ClassificationPets
Top-1 Accuracy94.6
29
Image ClassificationFood101
Top-1 Accuracy87.8
24
Showing 10 of 13 rows

Other info

Follow for update