ProMix: Combating Label Noise via Maximizing Clean Sample Utility

About

Learning with Noisy Labels (LNL) has become an appealing topic, as imperfectly annotated data are relatively cheaper to obtain. Recent state-of-the-art approaches employ specific selection mechanisms to separate clean and noisy samples and then apply Semi-Supervised Learning (SSL) techniques for improved performance. However, the selection step mostly provides a medium-sized and decent-enough clean subset, which overlooks a rich set of clean samples. To fulfill this, we propose a novel LNL framework ProMix that attempts to maximize the utility of clean samples for boosted performance. Key to our method, we propose a matched high confidence selection technique that selects those examples with high confidence scores and matched predictions with given labels to dynamically expand a base clean sample set. To overcome the potential side effect of excessive clean set selection procedure, we further devise a novel SSL framework that is able to train balanced and unbiased classifiers on the separated clean and noisy samples. Extensive experiments demonstrate that ProMix significantly advances the current state-of-the-art results on multiple benchmarks with different types and levels of noise. It achieves an average improvement of 2.48\% on the CIFAR-N dataset. The code is available at https://github.com/Justherozen/ProMix

Ruixuan Xiao, Yiwen Dong, Haobo Wang, Lei Feng, Runze Wu, Gang Chen, Junbo Zhao• 2022

Related benchmarks

Task	Dataset	Result
Semantic segmentation	S3DIS (Area 5)	mIOU64.1	1006
Image Classification	CIFAR-10 (test)	Accuracy95.5	882
Image Classification	Clothing1M (test)	Accuracy74.94	598
Image Classification	CIFAR-100 (test)	Top-1 Accuracy77.5	395
Semantic segmentation	ScanNet V2 (val)	mIoU66.1	380
Image Classification	CIFAR-100 (test)	Accuracy45.2	295
Image Classification	Food-101 (test)	Accuracy70.1	145
Image Classification	ANIMAL-10N (test)	Accuracy78.2	123
Image Classification	CIFAR-10N (Worst)	Accuracy96.34	89
Image Classification	CIFAR-10N (Aggregate)	Accuracy97.65	84

Showing 10 of 45 rows

Other info

Code

Follow for update

@wizwand_team Discord