Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data

About

Fine-tuning vision-language models (VLMs) with abundant unlabeled data recently has attracted increasing attention. Existing methods that resort to the pseudolabeling strategy would suffer from heavily incorrect hard pseudolabels when VLMs exhibit low zero-shot performance in downstream tasks. To alleviate this issue, we propose a Candidate Pseudolabel Learning method, termed CPL, to fine-tune VLMs with suitable candidate pseudolabels of unlabeled data in downstream tasks. The core of our method lies in the generation strategy of candidate pseudolabels, which progressively generates refined candidate pseudolabels by both intra- and inter-instance label selection, based on a confidence score matrix for all unlabeled data. This strategy can result in better performance in true label inclusion and class-balanced instance selection. In this way, we can directly apply existing loss functions to learn with generated candidate psueudolabels. Extensive experiments on nine benchmark datasets with three learning paradigms demonstrate the effectiveness of our method. Our code can be found at https://github.com/vanillaer/CPL-ICML2024.

Jiahan Zhang, Qi Wei, Feng Liu, Lei Feng• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationFlowers102
Accuracy65.73
478
Image ClassificationDTD
Accuracy50.11
419
Image ClassificationUCF101
Top-1 Acc68.23
404
Image ClassificationFood101
Accuracy88.64
309
Image ClassificationStanfordCars
Accuracy58.23
266
Image ClassificationFGVC-Aircraft (test)--
231
Image ClassificationFGVCAircraft
Accuracy22.86
225
Image ClassificationDTD (test)
Accuracy68
181
Image ClassificationCaltech101
Accuracy82.8
162
Image ClassificationFlowers-102 (test)
Top-1 Accuracy89.6
124
Showing 10 of 14 rows

Other info

Follow for update