Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SoC: Semantic Orthogonal Calibration for Test-Time Prompt Tuning

About

With the increasing adoption of vision-language models (VLMs) in critical decision-making systems such as healthcare or autonomous driving, the calibration of their uncertainty estimates becomes paramount. Yet, this dimension has been largely underexplored in the VLM test-time prompt-tuning (TPT) literature, which has predominantly focused on improving their discriminative performance. Recent state-of-the-art advocates for enforcing full orthogonality over pairs of text prompt embeddings to enhance separability, and therefore calibration. Nevertheless, as we theoretically show in this work, the inherent gradients from fully orthogonal constraints will strongly push semantically related classes away, ultimately making the model overconfident. Based on our findings, we propose Semantic Orthogonal Calibration (SoC), a Huber-based regularizer that enforces smooth prototype separation while preserving semantic proximity, thereby improving calibration compared to prior orthogonality-based approaches. Across a comprehensive empirical validation, we demonstrate that SoC consistently improves calibration performance, while also maintaining competitive discriminative capabilities.

Leo Fillioux, Omprakash Chakraborty, Ismail Ben Ayed, Paul-Henry Courn\`ede, Stergios Christodoulidis, Maria Vakalopoulou, Jose Dolz• 2026

Related benchmarks

TaskDatasetResultRank
Image ClassificationFood-101
Accuracy88.9
494
Image ClassificationFlowers102
Accuracy77
478
Image ClassificationStanford Cars
Accuracy77
477
Image ClassificationFood101--
309
Image ClassificationImageNet V2 1.0 (test)
Top-1 Accuracy68.9
54
Image ClassificationImageNet
Acc74.5
45
Image ClassificationFGVC Aircraft
Accuracy30.9
32
Image ClassificationUCF-101
Accuracy74.9
30
Image ClassificationEuroSAT
Accuracy58.3
26
Image ClassificationOxford-IIIT Pets
Accuracy93.9
26
Showing 10 of 26 rows

Other info

Follow for update