Self-supervised Pretraining of Visual Features in the Wild

About

Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self-supervised learning works in a real world setting. Interestingly, we also observe that self-supervised models are good few-shot learners achieving 77.9% top-1 with access to only 10% of ImageNet. Code: https://github.com/facebookresearch/vissl

Priya Goyal, Mathilde Caron, Benjamin Lefaudeux, Min Xu, Pengchao Wang, Vivek Pai, Mannat Singh, Vitaliy Liptchinsky, Ishan Misra, Armand Joulin, Piotr Bojanowski• 2021

Related benchmarks

Task	Dataset	Result
Object Detection	COCO 2017 (val)	--	2843
Instance Segmentation	COCO 2017 (val)	--	1275
Image Classification	ImageNet (val)	Top-1 Acc84.2	1206
Object Detection	COCO (val)	mAP41.6	637
Instance Segmentation	COCO (val)	APmk37.6	485
Instance Segmentation	COCO	APmask43.2	291
Object Detection	COCO	AP (Box)48.5	186
Image Classification	ImageNet 1% labeled	--	118
Image Classification	Places205 (val)	Top-1 Accuracy56	68
Image Classification	VOC 2007 (test)	mAP89.4	67

Showing 10 of 17 rows

Other info

Code

Follow for update

@wizwand_team Discord