Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Exploring the Limits of Weakly Supervised Pretraining

About

State-of-the-art visual perception models for a wide range of tasks rely on supervised pretraining. ImageNet classification is the de facto pretraining task for these models. Yet, ImageNet is now nearly ten years old and is by modern standards "small". Even so, relatively little is known about the behavior of pretraining with datasets that are multiple orders of magnitude larger. The reasons are obvious: such datasets are difficult to collect and annotate. In this paper, we present a unique study of transfer learning with large convolutional networks trained to predict hashtags on billions of social media images. Our experiments demonstrate that training for large-scale hashtag prediction leads to excellent results. We show improvements on several image classification and object detection tasks, and report the highest ImageNet-1k single-crop, top-1 accuracy to date: 85.4% (97.6% top-5). We also perform extensive experiments that provide novel empirical data on the relationship between large-scale pretraining and transfer learning performance.

Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, Laurens van der Maaten• 2018

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet-1k (val)
Top-1 Accuracy84.2
1453
Image ClassificationImageNet (val)
Top-1 Acc85.4
1206
ClassificationImageNet-1K 1.0 (val)
Top-1 Accuracy (%)86.06
1155
Fine-grained Image ClassificationCUB200 2011 (test)
Accuracy83.2
536
Image ClassificationImageNet
Top-1 Accuracy86.4
429
Image ClassificationImageNet ILSVRC-2012 (val)
Top-1 Accuracy85.4
405
Image ClassificationImageNet-ReaL
Precision@188.19
195
Image ClassificationImageNet-A (test)
Top-1 Acc61
154
ClassificationImageNet 1k (test val)
Top-1 Accuracy85.4
138
Image ClassificationImageNet-C (test)
mCE (Mean Corruption Error)51.7
110
Showing 10 of 19 rows

Other info

Follow for update