ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models

About

Large-scale vision-language models (VLMs) like CLIP successfully find correspondences between images and text. Through the standard deterministic mapping process, an image or a text sample is mapped to a single vector in the embedding space. This is problematic: as multiple samples (images or text) can abstract the same concept in the physical world, deterministic embeddings do not reflect the inherent ambiguity in the embedding space. We propose ProbVLM, a probabilistic adapter that estimates probability distributions for the embeddings of pre-trained VLMs via inter/intra-modal alignment in a post-hoc manner without needing large-scale datasets or computing. On four challenging datasets, i.e., COCO, Flickr, CUB, and Oxford-flowers, we estimate the multi-modal embedding uncertainties for two VLMs, i.e., CLIP and BLIP, quantify the calibration of embedding uncertainties in retrieval tasks and show that ProbVLM outperforms other methods. Furthermore, we propose active learning and model selection as two real-world downstream tasks for VLMs and show that the estimated uncertainty aids both tasks. Lastly, we present a novel technique for visualizing the embedding distributions using a large-scale pre-trained latent diffusion model. Code is available at https://github.com/ExplainableML/ProbVLM.

Uddeshya Upadhyay, Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata• 2023

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet-R (test)	--	170
Image Classification	Food101 (test)	--	97
Error detection	EuroSAT	AuROC42.72	48
Error detection	Flowers102	AuROC43.05	46
Out-of-Distribution Detection	DTD	AUROC46.57	36
Error detection	ImageNet	--	36
Error detection	Food101	AuROC52.87	29
Image Classification	ImageNet 1k (test)	Acc @ 90% Rej77.9	18
Image-to-Text Retrieval Aleatoric Uncertainty Calibration	CUB	Calibration Score S0.927	10
Image-to-Text Retrieval	CUB-200	Spearman Correlation (S)0.636	10

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord