Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization
About
Real-world large-scale datasets are heteroskedastic and imbalanced -- labels have varying levels of uncertainty and label distributions are long-tailed. Heteroskedasticity and imbalance challenge deep learning algorithms due to the difficulty of distinguishing among mislabeled, ambiguous, and rare examples. Addressing heteroskedasticity and imbalance simultaneously is under-explored. We propose a data-dependent regularization technique for heteroskedastic datasets that regularizes different regions of the input space differently. Inspired by the theoretical derivation of the optimal regularization strength in a one-dimensional nonparametric classification setting, our approach adaptively regularizes the data points in higher-uncertainty, lower-density regions more heavily. We test our method on several benchmark tasks, including a real-world heteroskedastic and imbalanced dataset, WebVision. Our experiments corroborate our theory and demonstrate a significant improvement over other methods in noise-robust deep learning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 (test) | Accuracy51.04 | 3518 | |
| Image Classification | CIFAR-10 (test) | Accuracy84.09 | 3381 | |
| Image Classification | ImageNet (val) | Top-1 Accuracy70.3 | 354 | |
| Image Classification | CIFAR-100 (test) | -- | 72 | |
| Image Classification | WebVision 1.0 (val) | Top-1 Acc75 | 59 | |
| Image Classification | WebVision (val) | Top-1 Acc75.5 | 40 | |
| Image Classification | CIFAR100 Clean (test) | Accuracy56.89 | 38 | |
| Image Classification | CIFAR-10 clean (test) | Test Accuracy87.81 | 30 | |
| Image Classification | CIFAR-10-N-LT Imbalance Ratio 100 | Accuracy (Noise 0.1)79.02 | 20 | |
| Image Classification | CIFAR-10-N-LT Imbalance Ratio 10 | Accuracy (NR 0.1)0.8703 | 20 |