Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Bi-CoG: Bi-Consistency-Guided Self-Training for Vision-Language Models

About

Exploiting unlabeled data through semi-supervised learning (SSL) or leveraging pre-trained models via fine-tuning are two prevailing paradigms for addressing label-scarce scenarios. Recently, growing attention has been given to combining fine-tuning of pre-trained vision-language models (VLMs) with SSL, forming the emerging paradigm of semi-supervised fine-tuning. However, existing methods often suffer from model bias and hyperparameter sensitivity, due to reliance on prediction consistency or pre-defined confidence thresholds. To address these limitations, we propose a simple yet effective plug-and-play methodology named $\underline{\textbf{Bi-Co}}$nsistency-$\underline{\textbf{G}}$uided Self-Training (Bi-CoG), which assigns high-quality and low-bias pseudo-labels, by simultaneously exploiting inter-model and intra-model consistency, along with an error-aware dynamic pseudo-label assignment strategy. Both theoretical analysis and extensive experiments over 14 datasets demonstrate the effectiveness of Bi-CoG, which consistently and significantly improves the performance of existing methods.

Rui Zhu, Song-Lin Lv, Zi-Kang Wang, Lan-Zhe Guo• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationStanfordCars
Accuracy73.77
384
Image ClassificationFGVCAircraft
Accuracy31.98
289
Image ClassificationOxfordPets
H Score96.93
182
Image ClassificationImageNet-100
Accuracy90.15
163
Image ClassificationFood101
Base Accuracy87.46
69
Image ClassificationCaltech101
Base Accuracy95.09
68
Skin lesion classificationISIC 2018 (test)--
30
Image ClassificationCIFAR-10 low-resolution (test)
Accuracy91.07
14
Image ClassificationCIFAR-100 low-resolution (test)
Accuracy66.84
14
Action RecognitionUCF101
Harmonic Mean (HM)83.09
7
Showing 10 of 19 rows

Other info

Follow for update