Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

About

Deep learning thrives with large neural networks and large datasets. However, larger networks and larger datasets result in longer training times that impede research and development progress. Distributed synchronous SGD offers a potential solution to this problem by dividing SGD minibatches over a pool of parallel workers. Yet to make this scheme efficient, the per-worker workload must be large, which implies nontrivial growth in the SGD minibatch size. In this paper, we empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization. Specifically, we show no loss of accuracy when training with large minibatch sizes up to 8192 images. To achieve this result, we adopt a hyper-parameter-free linear scaling rule for adjusting learning rates as a function of minibatch size and develop a new warmup scheme that overcomes optimization challenges early in training. With these simple techniques, our Caffe2-based system trains ResNet-50 with a minibatch size of 8192 on 256 GPUs in one hour, while matching small minibatch accuracy. Using commodity hardware, our implementation achieves ~90% scaling efficiency when moving from 8 to 256 GPUs. Our findings enable training visual recognition models on internet-scale data with high efficiency.

Priya Goyal, Piotr Doll\'ar, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He• 2017

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100 (test)
Accuracy81.45
3518
Image ClassificationCIFAR-10 (test)
Accuracy97.3
3381
Image ClassificationImageNet (val)
Top-1 Acc78.7
1206
Image ClassificationImageNet (test)--
235
Image ClassificationImageNet 2012 (val)
Top-1 Accuracy62.9
202
Image ClassificationImageNet (val)
Top-1 Accuracy75.2
118
Image ClassificationImageNet 1% labeled
Top-5 Accuracy48.4
118
Image ClassificationImageNet (10% labels)--
98
Image ClassificationImageNet-1k (val)
Top-1 Accuracy77.7
25
Image ClassificationImageNet natural distribution shift v2 (test)
Accuracy63.8
19
Showing 10 of 18 rows

Other info

Follow for update