Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Active Prompt Learning with Vision-Language Model Priors

About

Vision-language models (VLMs) have demonstrated remarkable zero-shot performance across various classification tasks. Nonetheless, their reliance on hand-crafted text prompts for each task hinders efficient adaptation to new tasks. While prompt learning offers a promising solution, most studies focus on maximizing the utilization of given few-shot labeled datasets, often overlooking the potential of careful data selection strategies, which enable higher accuracy with fewer labeled data. This motivates us to study a budget-efficient active prompt learning framework. Specifically, we introduce a class-guided clustering that leverages the pre-trained image and text encoders of VLMs, thereby enabling our cluster-balanced acquisition function from the initial round of active learning. Furthermore, considering the substantial class-wise variance in confidence exhibited by VLMs, we propose a budget-saving selective querying based on adaptive class-wise thresholds. Extensive experiments in active learning scenarios across seven datasets demonstrate that our method outperforms existing baselines.

Hoyoung Kim, Seokhee Jin, Changhwan Sung, Jaechang Kim, Jungseul Ok• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationDTD
Accuracy72.97
485
Image ClassificationISIC
Accuracy65.7
31
Image ClassificationKaoKore
Accuracy61.9
20
Showing 3 of 3 rows

Other info

Follow for update