Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UP-DP: Unsupervised Prompt Learning for Data Pre-Selection with Vision-Language Models

About

In this study, we investigate the task of data pre-selection, which aims to select instances for labeling from an unlabeled dataset through a single pass, thereby optimizing performance for undefined downstream tasks with a limited annotation budget. Previous approaches to data pre-selection relied solely on visual features extracted from foundation models, such as CLIP and BLIP-2, but largely ignored the powerfulness of text features. In this work, we argue that, with proper design, the joint feature space of both vision and text can yield a better representation for data pre-selection. To this end, we introduce UP-DP, a simple yet effective unsupervised prompt learning approach that adapts vision-language models, like BLIP-2, for data pre-selection. Specifically, with the BLIP-2 parameters frozen, we train text prompts to extract the joint features with improved representation, ensuring a diverse cluster structure that covers the entire dataset. We extensively compare our method with the state-of-the-art using seven benchmark datasets in different settings, achieving up to a performance gain of 20%. Interestingly, the prompts learned from one dataset demonstrate significant generalizability and can be applied directly to enhance the feature extraction of BLIP-2 from other datasets. To the best of our knowledge, UP-DP is the first work to incorporate unsupervised prompt learning in a vision-language model for data pre-selection.

Xin Li, Sima Behpour, Thang Doan, Wenbin He, Liang Gou, Liu Ren• 2023

Related benchmarks

TaskDatasetResultRank
Image ClassificationEuroSAT
Accuracy63.4
497
Image ClassificationFlowers102
Accuracy25.8
478
Action RecognitionUCF101
Accuracy28.9
365
Image ClassificationOxford-IIIT Pets
Accuracy58.2
259
Texture ClassificationDTD
Accuracy40.3
108
Fine-grained Visual CategorizationFGVCAircraft
Accuracy16.1
60
object recognitionCaltech101
Accuracy39.2
31
Showing 7 of 7 rows

Other info

Follow for update