Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Vision-Language Models are Strong Noisy Label Detectors

About

Recent research on fine-tuning vision-language models has demonstrated impressive performance in various downstream tasks. However, the challenge of obtaining accurately labeled data in real-world applications poses a significant obstacle during the fine-tuning process. To address this challenge, this paper presents a Denoising Fine-Tuning framework, called DeFT, for adapting vision-language models. DeFT utilizes the robust alignment of textual and visual features pre-trained on millions of auxiliary image-text pairs to sieve out noisy labels. The proposed framework establishes a noisy label detector by learning positive and negative textual prompts for each class. The positive prompt seeks to reveal distinctive features of the class, while the negative prompt serves as a learnable threshold for separating clean and noisy samples. We employ parameter-efficient fine-tuning for the adaptation of a pre-trained visual encoder to promote its alignment with the learned textual prompts. As a general framework, DeFT can seamlessly fine-tune many pre-trained models to downstream tasks by utilizing carefully selected clean samples. Experimental results on seven synthetic and real-world noisy datasets validate the effectiveness of DeFT in both noisy label detection and image classification.

Tong Wei, Hao-Tian Li, Chun-Shu Li, Jiang-Xin Shi, Yu-Feng Li, Min-Ling Zhang• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationClothing1M (test)
Accuracy72.44
546
Image ClassificationWebvision (test)
Acc85.12
57
Image ClassificationCIFAR-100 40% symmetric noise (test)--
19
Image ClassificationCIFAR-100 60% symmetric noise (test)--
19
Image ClassificationCIFAR-100 N (test)--
19
Image ClassificationCIFAR-100-N--
11
Noisy label detectionCIFAR-100N natural label noise (train)
Precision88.43
8
Image ClassificationTiny-ImageNet Symmetric Noise 0.2 (test)
Accuracy (Best)0.8291
5
Image ClassificationCIFAR-100 Symmetric Noise 0.2 (test)
Accuracy (Best)89.38
5
Image ClassificationCIFAR-100 Instance-dependent Noise 0.2 (test)
Accuracy (Best)89.38
5
Showing 10 of 29 rows

Other info

Code

Follow for update