Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

About

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko• 2017

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2 (test)	PPL5.94	2333
Image Classification	ImageNet-1k (val)	Top-1 Accuracy70.9	1498
Visual Question Answering	TextVQA	Accuracy84.8	1453
Instance Segmentation	COCO 2017 (val)	--	1275
Image Classification	ImageNet (val)	Top-1 Acc67.3	1206
Image Super-resolution	Manga109	PSNR30.95	875
Object Detection	COCO (val)	mAP40.4	637
Visual Question Answering	ChartQA	Accuracy89.8	519
Single Image Super-Resolution	Urban100	PSNR26.49	500
Language Modeling	WikiText2 (val)	Perplexity (PPL)5.94	423

Showing 10 of 75 rows

...

Other info

Follow for update

@wizwand_team Discord