Class-Balanced Loss Based on Effective Number of Samples

About

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, Serge Belongie• 2019

Related benchmarks

Task	Dataset	Result
Image Classification	iNaturalist 2018	Top-1 Accuracy61.12	291
Image Classification	ImageNet LT	Top-1 Accuracy80.5	264
Long-Tailed Image Classification	ImageNet-LT (test)	Top-1 Acc (Overall)80.5	246
Image Classification	CIFAR-100 Long-Tailed (test)	Top-1 Accuracy59.8	234
Image Classification	iNaturalist 2018 (test)	Top-1 Accuracy61.12	223
Image Classification	CIFAR-10 long-tailed (test)	Top-1 Acc89.9	211
Image Classification	CIFAR-10-LT (test)	Top-1 Error0.1252	185
Image Classification	ImageNet-LT (test)	Top-1 Acc (All)48.5	159
Image Classification	ILSVRC 2012 (val)	--	156
Image Classification	CIFAR100 long-tailed (test)	Accuracy58	155

Showing 10 of 215 rows

...

Other info

Code

Follow for update

@wizwand_team Discord