Self Supervision to Distillation for Long-Tailed Visual Recognition
About
Deep learning has achieved remarkable progress for visual recognition on large-scale balanced datasets but still performs poorly on real-world long-tailed data. Previous methods often adopt class re-balanced training strategies to effectively alleviate the imbalance issue, but might be a risk of over-fitting tail classes. The recent decoupling method overcomes over-fitting issues by using a multi-stage training scheme, yet, it is still incapable of capturing tail class information in the feature learning stage. In this paper, we show that soft label can serve as a powerful solution to incorporate label correlation into a multi-stage training scheme for long-tailed recognition. The intrinsic relation between classes embodied by soft labels turns out to be helpful for long-tailed recognition by transferring knowledge from head to tail classes. Specifically, we propose a conceptually simple yet particularly effective multi-stage training scheme, termed as Self Supervised to Distillation (SSD). This scheme is composed of two parts. First, we introduce a self-distillation framework for long-tailed recognition, which can mine the label relation automatically. Second, we present a new distillation label generation module guided by self-supervision. The distilled labels integrate information from both label and data domains that can model long-tailed distribution effectively. We conduct extensive experiments and our method achieves the state-of-the-art results on three long-tailed recognition benchmarks: ImageNet-LT, CIFAR100-LT and iNaturalist 2018. Our SSD outperforms the strong LWS baseline by from $2.7\%$ to $4.5\%$ on various datasets. The code is available at https://github.com/MCG-NJU/SSD-LT.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | iNaturalist 2018 | Top-1 Accuracy71.5 | 287 | |
| Image Classification | ImageNet LT | Top-1 Accuracy56 | 251 | |
| Long-Tailed Image Classification | ImageNet-LT (test) | Top-1 Acc (Overall)56 | 220 | |
| Image Classification | ImageNet-LT (test) | Top-1 Acc (All)56 | 159 | |
| Long-tailed Visual Recognition | ImageNet LT | Overall Accuracy56 | 89 | |
| Long-Tailed Image Classification | iNaturalist 2018 | Accuracy71.5 | 82 | |
| Image Classification | CIFAR-100-LT IF 100 (test) | Top-1 Acc46 | 77 | |
| Image Classification | CIFAR-100-LT Imbalance Ratio 100 (test) | Accuracy46 | 62 | |
| Image Classification | CIFAR-100 Imbalance Ratio LT-50 (test) | Accuracy50.5 | 62 | |
| Image Classification | CIFAR-100 LT Imbalance Ratio 10 (test) | Accuracy62.3 | 59 |