Vision-Language Models are Strong Noisy Label Detectors

About

Recent research on fine-tuning vision-language models has demonstrated impressive performance in various downstream tasks. However, the challenge of obtaining accurately labeled data in real-world applications poses a significant obstacle during the fine-tuning process. To address this challenge, this paper presents a Denoising Fine-Tuning framework, called DeFT, for adapting vision-language models. DeFT utilizes the robust alignment of textual and visual features pre-trained on millions of auxiliary image-text pairs to sieve out noisy labels. The proposed framework establishes a noisy label detector by learning positive and negative textual prompts for each class. The positive prompt seeks to reveal distinctive features of the class, while the negative prompt serves as a learnable threshold for separating clean and noisy samples. We employ parameter-efficient fine-tuning for the adaptation of a pre-trained visual encoder to promote its alignment with the learned textual prompts. As a general framework, DeFT can seamlessly fine-tune many pre-trained models to downstream tasks by utilizing carefully selected clean samples. Experimental results on seven synthetic and real-world noisy datasets validate the effectiveness of DeFT in both noisy label detection and image classification.

Tong Wei, Hao-Tian Li, Chun-Shu Li, Jiang-Xin Shi, Yu-Feng Li, Min-Ling Zhang• 2024

Related benchmarks

Task	Dataset	Result
Image Classification	Clothing1M (test)	Accuracy72.44	598
Image Classification	Oxford Pets (test)	Accuracy88.83	125
Image Classification	CIFAR-100-N	Accuracy79.04	62
Image Classification	Webvision (test)	Acc85.12	57
Mislabeled Data Detection	DeepDRiD	F1 Score62.95	55
Mislabeled Data Detection	ISIC	F1 Score55.4	55
Mislabeled Data Detection	Panda	F1 Score69.59	55
Noisy label detection	CIFAR-100N natural label noise (train)	F1-score90.58	19
Image Classification	CIFAR-100 40% symmetric noise (test)	--	19
Image Classification	CIFAR-100 60% symmetric noise (test)	--	19

Showing 10 of 34 rows

Other info

Code

Follow for update

@wizwand_team Discord