Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
About
The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 (test) | PPL5.94 | 1949 | |
| Image Classification | ImageNet-1k (val) | Top-1 Accuracy70.9 | 1469 | |
| Visual Question Answering | TextVQA | Accuracy84.8 | 1285 | |
| Image Classification | ImageNet (val) | Top-1 Acc67.3 | 1206 | |
| Instance Segmentation | COCO 2017 (val) | -- | 1201 | |
| Image Super-resolution | Manga109 | PSNR30.95 | 821 | |
| Object Detection | COCO (val) | mAP40.4 | 633 | |
| Single Image Super-Resolution | Urban100 | PSNR26.49 | 500 | |
| Language Modeling | WikiText2 (val) | Perplexity (PPL)5.94 | 387 | |
| Visual Question Answering | ChartQA | Accuracy89.8 | 371 |