Towards Calibrating Prompt Tuning of Vision-Language Models

About

Prompt tuning of large-scale vision-language models such as CLIP enables efficient task adaptation without updating model weights. However, it often leads to poor confidence calibration and unreliable predictive uncertainty. We address this problem by proposing a calibration framework that enhances predictive reliability while preserving the geometry of the pretrained CLIP embedding space, which is required for robust generalization. Our approach extends the standard cross-entropy loss with two complementary regularizers: (1) a mean-variance margin penalty that stabilizes inter-class logit margins by maximizing their average while minimizing dispersion, mitigating underconfidence and overconfidence spikes; and (2) a text moment-matching loss that aligns the first and second moments of tuned text embeddings with their frozen CLIP counterparts, preserving semantic dispersion crucial for generalization. Through extensive experiments across 7 prompt-tuning methods and 11 diverse datasets, we demonstrate that our approach significantly reduces the Expected Calibration Error (ECE) compared to competitive calibration techniques on both base and novel classes

Ashshak Sharifdeen, Fahad Shamshad, Muhammad Akhtar Munir, Abhishek Basu, Mohamed Insaf Ismithdeen, Jeyapriyan Jeyamohan, Chathurika Sewwandi Silva, Karthik Nandakumar, Muhammad Haris Khan• 2026

Related benchmarks

Task	Dataset	Result
Fine-grained Image Classification	DTD (novel classes)	ECE3.3	36
Image Classification	Food101 novel classes	Accuracy0.9168	36
Fine-grained Image Classification	FGVCAircraft (novel classes)	ECE5.36	36
Fine grained classification	SUN397 novel classes	ECE0.77	28
Fine-grained Image Classification	Caltech101 novel classes	ECE1.03	28
Fine-grained Image Classification	OxfordPets novel classes	ECE1.19	28
Fine-grained Image Classification	Flowers102 (novel classes)	ECE3.51	28
Fine-grained Image Classification	UCF101 novel classes	Expected Calibration Error1.89	28
Fine grained classification	EuroSAT (novel classes)	Expected Calibration Error4.15	28
Fine-grained Image Classification	StanfordCars novel classes	ECE1.98	28

Showing 10 of 54 rows

Other info

Follow for update

@wizwand_team Discord