Label Propagation for Zero-shot Classification with Vision-Language Models

About

Vision-Language Models (VLMs) have demonstrated impressive performance on zero-shot classification, i.e. classification when provided merely with a list of class names. In this paper, we tackle the case of zero-shot classification in the presence of unlabeled data. We leverage the graph structure of the unlabeled data and introduce ZLaP, a method based on label propagation (LP) that utilizes geodesic distances for classification. We tailor LP to graphs containing both text and image features and further propose an efficient method for performing inductive inference based on a dual solution and a sparsification step. We perform extensive experiments to evaluate the effectiveness of our method on 14 common datasets and show that ZLaP outperforms the latest related works. Code: https://github.com/vladan-stojnic/ZLaP

Vladan Stojni\'c, Yannis Kalantidis, Giorgos Tolias• 2024

Related benchmarks

Task	Dataset	Result
Image Classification	DTD	Accuracy51.8	599
Image Classification	EuroSAT	Accuracy60.9	569
Image Classification	UCF101	Top-1 Acc77.7	527
Classification	Cars	Accuracy72.1	492
Image Classification	CUB	Accuracy64.1	331
Fine-grained visual classification	FGVC-Aircraft (test)	Top-1 Acc26.3	312
Image Classification	Pets	Accuracy92.8	308
Image Classification	FGVCAircraft	Accuracy28.4	289
Image Classification	Food	Accuracy87.9	152
Image Classification	Flowers	Accuracy73.4	135

Showing 10 of 18 rows

Other info

Code

Follow for update

@wizwand_team Discord