Self-supervised Pretraining of Visual Features in the Wild
About
Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self-supervised learning works in a real world setting. Interestingly, we also observe that self-supervised models are good few-shot learners achieving 77.9% top-1 with access to only 10% of ImageNet. Code: https://github.com/facebookresearch/vissl
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | COCO 2017 (val) | -- | 2454 | |
| Image Classification | ImageNet (val) | Top-1 Acc84.2 | 1206 | |
| Instance Segmentation | COCO 2017 (val) | -- | 1144 | |
| Object Detection | COCO (val) | mAP41.6 | 613 | |
| Instance Segmentation | COCO (val) | APmk37.6 | 472 | |
| Instance Segmentation | COCO | APmask43.2 | 279 | |
| Object Detection | COCO | AP (Box)48.5 | 144 | |
| Image Classification | ImageNet 1% labeled | -- | 118 | |
| Image Classification | Places205 (val) | Top-1 Accuracy56 | 68 | |
| Image Classification | VOC 2007 (test) | mAP89.4 | 67 |