Big Transfer (BiT): General Visual Representation Learning
About
Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). By combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets. BiT performs well across a surprisingly wide range of data regimes -- from 1 example per class to 1M total examples. BiT achieves 87.5% top-1 accuracy on ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the 19 task Visual Task Adaptation Benchmark (VTAB). On small datasets, BiT attains 76.8% on ILSVRC-2012 with 10 examples per class, and 97.0% on CIFAR-10 with 10 examples per class. We conduct detailed analysis of the main components that lead to high transfer performance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 (test) | Accuracy93.51 | 3518 | |
| Image Classification | CIFAR-10 (test) | -- | 3381 | |
| Object Detection | COCO 2017 (val) | AP43.8 | 2454 | |
| Image Classification | ImageNet-1K 1.0 (val) | Top-1 Accuracy85.4 | 1866 | |
| Image Classification | ImageNet-1k (val) | -- | 1453 | |
| Classification | ImageNet-1K 1.0 (val) | Top-1 Accuracy (%)87.54 | 1155 | |
| Image Classification | CIFAR-10 (test) | Accuracy97.47 | 906 | |
| Image Classification | ImageNet-1k (val) | Top-1 Accuracy85.4 | 840 | |
| Image Classification | ImageNet 1k (test) | -- | 798 | |
| Image Classification | CIFAR-100 | -- | 622 |