Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
About
The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ImageNet-1k (val) | Top-1 Accuracy70.9 | 1453 | |
| Image Classification | ImageNet (val) | Top-1 Acc67.3 | 1206 | |
| Instance Segmentation | COCO 2017 (val) | -- | 1144 | |
| Visual Question Answering | TextVQA | Accuracy84.8 | 1117 | |
| Image Super-resolution | Manga109 | PSNR30.95 | 656 | |
| Object Detection | COCO (val) | mAP40.4 | 613 | |
| Single Image Super-Resolution | Urban100 | PSNR26.49 | 500 | |
| Single Image Super-Resolution | Set5 | PSNR32.39 | 352 | |
| OCR Evaluation | OCRBench | Score848 | 296 | |
| Single Image Super-Resolution | Set14 | PSNR28.77 | 252 |