Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Big Transfer (BiT): General Visual Representation Learning

About

Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). By combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets. BiT performs well across a surprisingly wide range of data regimes -- from 1 example per class to 1M total examples. BiT achieves 87.5% top-1 accuracy on ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the 19 task Visual Task Adaptation Benchmark (VTAB). On small datasets, BiT attains 76.8% on ILSVRC-2012 with 10 examples per class, and 97.0% on CIFAR-10 with 10 examples per class. We conduct detailed analysis of the main components that lead to high transfer performance.

Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby• 2019

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100 (test)
Accuracy93.51
3518
Image ClassificationCIFAR-10 (test)--
3381
Object DetectionCOCO 2017 (val)
AP43.8
2843
Image ClassificationImageNet-1K 1.0 (val)
Top-1 Accuracy85.4
2238
Image ClassificationImageNet-1k (val)--
1498
ClassificationImageNet-1K 1.0 (val)
Top-1 Accuracy (%)87.54
1171
Image ClassificationImageNet-1k (val)
Top-1 Accuracy85.4
920
Image ClassificationCIFAR-10 (test)
Accuracy97.47
906
Image ClassificationImageNet 1k (test)--
880
Image ClassificationImageNet-1k (val)
Top-1 Accuracy87.5
708
Showing 10 of 49 rows

Other info

Code

Follow for update