ScanMix: Learning from Severe Label Noise via Semantic Clustering and Semi-Supervised Learning
About
We propose a new training algorithm, ScanMix, that explores semantic clustering and semi-supervised learning (SSL) to allow superior robustness to severe label noise and competitive robustness to non-severe label noise problems, in comparison to the state of the art (SOTA) methods. ScanMix is based on the expectation maximisation framework, where the E-step estimates the latent variable to cluster the training images based on their appearance and classification results, and the M-step optimises the SSL classification and learns effective feature representations via semantic clustering. We present a theoretical result that shows the correctness and convergence of ScanMix, and an empirical result that shows that ScanMix has SOTA results on CIFAR-10/-100 (with symmetric, asymmetric and semantic label noise), Red Mini-ImageNet (from the Controlled Noisy Web Labels), Clothing1M and WebVision. In all benchmarks with severe label noise, our results are competitive to the current SOTA.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | Clothing1M (test) | Accuracy74.35 | 546 | |
| Image Classification | ILSVRC 2012 (test) | Top-1 Acc75.76 | 117 | |
| Image Classification | CIFAR-100 (test) | Accuracy (Symmetric 20%)77 | 72 | |
| Image Classification | Webvision (test) | Acc80.04 | 57 | |
| Image Classification | Red Mini-ImageNet (test) | Accuracy59.06 | 51 | |
| Image Classification | CIFAR-10 (test) | Accuracy (Sym, 20%)96 | 22 | |
| Image Classification | CIFAR-10 semantic asymmetric noise (test) | Accuracy89.96 | 21 | |
| Image Classification | CIFAR-100 semantic noise (test) | Accuracy68.44 | 21 |