Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Class-Balanced Loss Based on Effective Number of Samples

About

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, Serge Belongie• 2019

Related benchmarks

TaskDatasetResultRank
Image ClassificationiNaturalist 2018
Top-1 Accuracy61.12
287
Image ClassificationImageNet LT
Top-1 Accuracy80.5
251
Long-Tailed Image ClassificationImageNet-LT (test)
Top-1 Acc (Overall)80.5
220
Image ClassificationCIFAR-10 long-tailed (test)
Top-1 Acc89.9
201
Image ClassificationiNaturalist 2018 (test)
Top-1 Accuracy61.12
192
Image ClassificationCIFAR-10-LT (test)
Top-1 Error0.1252
185
Image ClassificationImageNet-LT (test)
Top-1 Acc (All)48.5
159
Image ClassificationILSVRC 2012 (val)--
156
Image ClassificationCIFAR100 long-tailed (test)
Accuracy58
155
Image ClassificationCIFAR-100 Long-Tailed (test)
Top-1 Accuracy59.8
149
Showing 10 of 190 rows
...

Other info

Code

Follow for update