Billion-scale semi-supervised learning for image classification
About
This paper presents a study of semi-supervised learning with large convolutional networks. We propose a pipeline, based on a teacher/student paradigm, that leverages a large collection of unlabelled images (up to 1 billion). Our main goal is to improve the performance for a given target architecture, like ResNet-50 or ResNext. We provide an extensive analysis of the success factors of our approach, which leads us to formulate some recommendations to produce high-accuracy models for image classification with semi-supervised learning. As a result, our approach brings important gains to standard architectures for image, video and fine-grained classification. For instance, by leveraging one billion unlabelled images, our learned vanilla ResNet-50 achieves 81.2% top-1 accuracy on the ImageNet benchmark.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ImageNet-1k (val) | Top-1 Accuracy84.8 | 1453 | |
| Image Classification | ImageNet (val) | Top-1 Acc81.2 | 1206 | |
| Fine-grained Image Classification | CUB200 2011 (test) | Accuracy84.8 | 536 | |
| Image Classification | ImageNet | Top-1 Accuracy84.8 | 429 | |
| Image Classification | ImageNet ILSVRC-2012 (val) | Top-1 Accuracy84.8 | 405 | |
| Video Recognition | Kinetics (val) | Top-1 Accuracy76.7 | 36 | |
| Instance Segmentation | Habitat Gibson Generalization 1.0 (val) | Mask AP5031.89 | 10 | |
| Object Detection | Habitat Gibson Specialization 1.0 (val) | Box AP5034.11 | 10 | |
| Instance Segmentation | Habitat Gibson Specialization 1.0 (val) | Mask AP5031.23 | 10 | |
| Object Detection | Habitat Gibson Generalization 1.0 (val) | AP50 (Box)33.41 | 10 |