Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning

About

Recent advancements in semi-supervised learning have focused on a more realistic yet challenging task: addressing imbalances in labeled data while the class distribution of unlabeled data remains both unknown and potentially mismatched. Current approaches in this sphere often presuppose rigid assumptions regarding the class distribution of unlabeled data, thereby limiting the adaptability of models to only certain distribution ranges. In this study, we propose a novel approach, introducing a highly adaptable framework, designated as SimPro, which does not rely on any predefined assumptions about the distribution of unlabeled data. Our framework, grounded in a probabilistic model, innovatively refines the expectation-maximization (EM) algorithm by explicitly decoupling the modeling of conditional and marginal class distributions. This separation facilitates a closed-form solution for class distribution estimation during the maximization phase, leading to the formulation of a Bayes classifier. The Bayes classifier, in turn, enhances the quality of pseudo-labels in the expectation phase. Remarkably, the SimPro framework not only comes with theoretical guarantees but also is straightforward to implement. Moreover, we introduce two novel class distributions broadening the scope of the evaluation. Our method showcases consistent state-of-the-art performance across diverse benchmarks and data distribution scenarios. Our code is available at https://github.com/LeapLabTHU/SimPro.

Chaoqun Du, Yizeng Han, Gao Huang• 2024

Related benchmarks

TaskDatasetResultRank
ClassificationCIFAR-10
Accuracy16.91
108
Image ClassificationCIFAR10 LT (test)
Accuracy80.7
106
Image ClassificationSTL10-LT (gamma_l = 10) (test)
Accuracy84.5
65
Image ClassificationCIFAR100-LT (test)
Top-1 Acc (Avg)43.1
65
Image ClassificationSTL10 gamma_l = 20 long-tail (test)
Accuracy82.5
49
Image ClassificationImageNet-127 (test)
Accuracy67
42
Image ClassificationCIFAR-10 LT uniform distribution, gamma_l=100, gamma_u=1
Accuracy93.8
17
Image ClassificationCIFAR-10-LT reversed distribution, gamma_l=100, gamma_u=1/100
Accuracy85.8
17
Image ClassificationCIFAR-100
Accuracy (Uniform, M=24600, γu=1)19.1
12
Image ClassificationCIFAR-100-LT (γl = 20, N1 = 50, M1 = 400) (test)
Random Dist. 1 Accuracy44.9
3
Showing 10 of 10 rows

Other info

Follow for update