Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model

About

Vision-language models (VLMs) have revolutionized machine learning by leveraging large pre-trained models to tackle various downstream tasks. Although label, training, and data efficiency have improved, many state-of-the-art VLMs still require task-specific hyperparameter tuning and fail to fully exploit test samples. To overcome these challenges, we propose a graph-based approach for label-efficient adaptation and inference. Our method dynamically constructs a graph over text prompts, few-shot examples, and test samples, using label propagation for inference without task-specific tuning. Unlike existing zero-shot label propagation techniques, our approach requires no additional unlabeled support set and effectively leverages the test sample manifold through dynamic graph expansion. We further introduce a context-aware feature re-weighting mechanism to improve task adaptation accuracy. Additionally, our method supports efficient graph expansion, enabling real-time inductive inference. Extensive evaluations on downstream tasks, such as fine-grained categorization and out-of-distribution generalization, demonstrate the effectiveness of our approach. The source code is available at https://github.com/Yushu-Li/ECALP.

Yushu Li, Yongyi Su, Adam Goodge, Kui Jia, Xun Xu• 2024

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-10C Severity Level 5 (test)	Average Error Rate (Severity 5)63.96	136
Image Classification	ImageNet-C Severity 5 (test)	Mean Error Rate (Severity 5)25.16	132
Image Classification	CIFAR-100-C	Accuracy (Corruption)50.73	109
Image Classification	Average 11 datasets	--	95
Image Classification	CIFAR-100-C v1 (test)	Error Rate (Average)32.85	60
Image Classification	CIFAR-100C Level 5 (test)	Mean Accuracy (C5)36.98	56
Image Classification	ImageNet-C 1.0 (test)	--	53
Image Classification	CIFAR10-C	Mean Accuracy (mAcc)77.78	41
Image Classification	CIFAR-10-C v1 (test)	--	28
Image Classification	11 natural datasets transductive setting	Average Accuracy70.5	13

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord