Extracting Clean and Balanced Subset for Noisy Long-tailed Classification
About
Real-world datasets usually are class-imbalanced and corrupted by label noise. To solve the joint issue of long-tailed distribution and label noise, most previous works usually aim to design a noise detector to distinguish the noisy and clean samples. Despite their effectiveness, they may be limited in handling the joint issue effectively in a unified way. In this work, we develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching, which can be solved with optimal transport (OT). By setting a manually-specific probability measure and using a learned transport plan to pseudo-label the training samples, the proposed method can reduce the side-effects of noisy and long-tailed data simultaneously. Then we introduce a simple yet effective filter criteria by combining the observed labels and pseudo labels to obtain a more balanced and less noisy subset for a robust model training. Extensive experiments demonstrate that our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-10-LT | Top-1 Accuracy89.1 | 127 | |
| Image Classification | CIFAR-100 LTN | Top-1 Accuracy63.6 | 60 | |
| Image Classification | Img-LTNr (test) | Top-1 Accuracy50.8 | 34 | |
| Image Classification | CIFAR-10-LTN Symmetric Noise (test) | Top-1 Accuracy86.4 | 34 | |
| Image Classification | CIFAR-100-LTN Symmetric Noise (test) | Top-1 Accuracy56.7 | 34 | |
| Image Classification | CIFAR-100 LTN (Asymmetric Noise) (test) | Top-1 Accuracy60.5 | 34 | |
| Image Classification | WebVision 50 (test) | Top-1 Accuracy80 | 29 | |
| Image Classification | CIFAR-100-LTN Symmetric Noise IF=10, NR=40% (test) | Accuracy56.7 | 20 | |
| Image Classification | CIFAR-100-LTN Asymmetric Noise IF=10, NR=40% (test) | Accuracy52.1 | 20 | |
| Image Classification | CIFAR-100 Long-Tailed Noisy IF=10 Symmetric Noise 0.6 (test) | Top-1 Accuracy48.1 | 18 |