Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

About

Real-world large-scale datasets are heteroskedastic and imbalanced -- labels have varying levels of uncertainty and label distributions are long-tailed. Heteroskedasticity and imbalance challenge deep learning algorithms due to the difficulty of distinguishing among mislabeled, ambiguous, and rare examples. Addressing heteroskedasticity and imbalance simultaneously is under-explored. We propose a data-dependent regularization technique for heteroskedastic datasets that regularizes different regions of the input space differently. Inspired by the theoretical derivation of the optimal regularization strength in a one-dimensional nonparametric classification setting, our approach adaptively regularizes the data points in higher-uncertainty, lower-density regions more heavily. We test our method on several benchmark tasks, including a real-world heteroskedastic and imbalanced dataset, WebVision. Our experiments corroborate our theory and demonstrate a significant improvement over other methods in noise-robust deep learning.

Kaidi Cao, Yining Chen, Junwei Lu, Nikos Arechiga, Adrien Gaidon, Tengyu Ma• 2020

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100 (test)	Accuracy51.04	3518
Image Classification	CIFAR-10 (test)	Accuracy84.09	3381
Image Classification	ImageNet (val)	Top-1 Accuracy70.3	354
Image Classification	CIFAR-100 (test)	--	72
Image Classification	WebVision 1.0 (val)	Top-1 Acc75	59
Image Classification	WebVision (val)	Top-1 Acc75.5	57
Image Classification	CIFAR100 Clean (test)	Accuracy56.89	48
Image Classification	CIFAR-10 clean (test)	Test Accuracy87.81	30
Image Classification	CIFAR-10-N-LT Imbalance Ratio 100	Accuracy (Noise 0.1)79.02	20
Image Classification	CIFAR-10-N-LT Imbalance Ratio 10	Accuracy (NR 0.1)0.8703	20

Showing 10 of 21 rows

Other info

Code

Follow for update

@wizwand_team Discord