Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Extracting Clean and Balanced Subset for Noisy Long-tailed Classification

About

Real-world datasets usually are class-imbalanced and corrupted by label noise. To solve the joint issue of long-tailed distribution and label noise, most previous works usually aim to design a noise detector to distinguish the noisy and clean samples. Despite their effectiveness, they may be limited in handling the joint issue effectively in a unified way. In this work, we develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching, which can be solved with optimal transport (OT). By setting a manually-specific probability measure and using a learned transport plan to pseudo-label the training samples, the proposed method can reduce the side-effects of noisy and long-tailed data simultaneously. Then we introduce a simple yet effective filter criteria by combining the observed labels and pseudo labels to obtain a more balanced and less noisy subset for a robust model training. Extensive experiments demonstrate that our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.

Zhuo Li, He Zhao, Zhen Li, Tongliang Liu, Dandan Guo, Xiang Wan• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-10-LT
Top-1 Accuracy89.1
127
Image ClassificationCIFAR-100 LTN
Top-1 Accuracy63.6
60
Image ClassificationImg-LTNr (test)
Top-1 Accuracy50.8
34
Image ClassificationCIFAR-10-LTN Symmetric Noise (test)
Top-1 Accuracy86.4
34
Image ClassificationCIFAR-100-LTN Symmetric Noise (test)
Top-1 Accuracy56.7
34
Image ClassificationCIFAR-100 LTN (Asymmetric Noise) (test)
Top-1 Accuracy60.5
34
Image ClassificationWebVision 50 (test)
Top-1 Accuracy80
29
Image ClassificationCIFAR-100-LTN Symmetric Noise IF=10, NR=40% (test)
Accuracy56.7
20
Image ClassificationCIFAR-100-LTN Asymmetric Noise IF=10, NR=40% (test)
Accuracy52.1
20
Image ClassificationCIFAR-100 Long-Tailed Noisy IF=10 Symmetric Noise 0.6 (test)
Top-1 Accuracy48.1
18
Showing 10 of 15 rows

Other info

Follow for update