DivideMix: Learning with Noisy Labels as Semi-supervised Learning

About

Deep neural networks are known to be annotation-hungry. Numerous efforts have been devoted to reducing the annotation cost when learning with deep networks. Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data. In this work, we propose DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. In particular, DivideMix models the per-sample loss distribution with a mixture model to dynamically divide the training data into a labeled set with clean samples and an unlabeled set with noisy samples, and trains the model on both the labeled and unlabeled data in a semi-supervised manner. To avoid confirmation bias, we simultaneously train two diverged networks where each network uses the dataset division from the other network. During the semi-supervised training phase, we improve the MixMatch strategy by performing label co-refinement and label co-guessing on labeled and unlabeled samples, respectively. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods. Code is available at https://github.com/LiJunnan1992/DivideMix .

Junnan Li, Richard Socher, Steven C.H. Hoi• 2020

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100 (test)	Accuracy77.3	3518
Image Classification	CIFAR-10 (test)	Accuracy96.2	3381
Image Classification	ImageNet (val)	Top-1 Acc75.2	1206
Image Classification	CIFAR-10 (test)	Accuracy94.9	882
Image Classification	CIFAR-100	Accuracy77.3	691
Image Classification	Clothing1M (test)	Accuracy74.8	598
Fine-grained Image Classification	CUB200 2011 (test)	Accuracy72.76	567
Image Classification	CIFAR-10	Accuracy96.1	564
Image Classification	CIFAR-10	Accuracy85.71	507
Image Classification	ImageNet ILSVRC-2012 (val)	Top-1 Accuracy75.2	441

Showing 10 of 283 rows

...

Other info

Code

Follow for update

@wizwand_team Discord