Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

About

Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on resource-limited devices like mobile phones. However, due to the discreteness of low-bit quantization, existing quantization methods often face the unstable training process and severe performance degradation. To address this problem, in this paper we propose Differentiable Soft Quantization (DSQ) to bridge the gap between the full-precision and low-bit networks. DSQ can automatically evolve during training to gradually approximate the standard quantization. Owing to its differentiable property, DSQ can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range. Extensive experiments over several popular network structures show that training low-bit neural networks with DSQ can consistently outperform state-of-the-art quantization methods. Besides, our first efficient implementation for deploying 2 to 4-bit DSQ on devices with ARM architecture achieves up to 1.7$\times$ speed up, compared with the open-source 8-bit high-performance inference framework NCNN. [31]

Ruihao Gong, Xianglong Liu, Shenghu Jiang, Tianxiang Li, Peng Hu, Jiazhen Lin, Fengwei Yu, Junjie Yan• 2019

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-10 (test)
Accuracy91.72
3381
Image ClassificationImageNet-1k (val)
Top-1 Accuracy72.8
1498
Image ClassificationImageNet (val)
Top-1 Acc72.8
1206
Image ClassificationCIFAR-10 (test)
Accuracy91.7
882
Image ClassificationImageNet ILSVRC-2012 (val)
Top-1 Accuracy63.71
441
Image ClassificationImageNet-1k (val)
Top-1 Acc64.8
188
Image ClassificationImageNet (val)
Top-1 Accuracy64.8
188
Image ClassificationImageNet (val)
Accuracy72.8
115
Image ClassificationImageNet (val)
Top-1 Accuracy73.39
26
Zero-shot Language UnderstandingEvaluation Suite Zero-shot (LMB, HellA, PIQA, ARC-E, ARC-C, WINO, Open, MMLU)
ARC-E Accuracy70.19
25
Showing 10 of 11 rows

Other info

Follow for update