Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Intra-Class Probabilistic Embeddings for Uncertainty Estimation in Vision-Language Models

About

Vision-language models (VLMs), such as CLIP, have gained popularity for their strong open vocabulary classification performance, but they are prone to assigning high confidence scores to misclassifications, limiting their reliability in safety-critical applications. We introduce a training-free, post-hoc uncertainty estimation method for contrastive VLMs that can be used to detect erroneous predictions. The key to our approach is to measure visual feature consistency within a class, using feature projection combined with multivariate Gaussians to create class-specific probabilistic embeddings. Our method is VLM-agnostic, requires no fine-tuning, demonstrates robustness to distribution shift, and works effectively with as few as 10 training images per class. Extensive experiments on ImageNet, Flowers102, Food101, EuroSAT and DTD show state-of-the-art error detection performance, significantly outperforming both deterministic and probabilistic VLM baselines. Code is available at https://github.com/zhenxianglin/ICPE.

Zhenxiang Lin, Maryam Haghighat, Will Browne, Dimity Miller• 2025

Related benchmarks

TaskDatasetResultRank
Out-of-Distribution DetectionDTD
AUROC94.08
36
Error detectionImageNet
AuROC88.57
35
Error detectionFlowers102
AuROC99.38
27
Error detectionFood101
AuROC95.06
27
Error detectionEuroSAT
AuROC98.1
27
Error detectionImageNet V2 (test)
AuROC86.19
7
Error detectionImageNet-C (test)
AuROC83.51
7
Error detectionImageNet-1k 1st-level superclasses 1.0 (test)
AuROC77.34
7
Error detectionImageNet-1k 2nd-level superclasses 1.0 (test)
AuROC70.64
7
Error detectionImageNet-A (test)
AuROC0.7034
7
Showing 10 of 10 rows

Other info

Follow for update