Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Up or Down? Adaptive Rounding for Post-Training Quantization

About

When quantizing neural networks, assigning each floating-point weight to its nearest fixed-point value is the predominant approach. We find that, perhaps surprisingly, this is not the best we can do. In this paper, we propose AdaRound, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss. AdaRound is fast, does not require fine-tuning of the network, and only uses a small amount of unlabelled data. We start by theoretically analyzing the rounding problem for a pre-trained neural network. By approximating the task loss with a Taylor series expansion, the rounding task is posed as a quadratic unconstrained binary optimization problem. We simplify this to a layer-wise local loss and propose to optimize this loss with a soft relaxation. AdaRound not only outperforms rounding-to-nearest by a significant margin but also establishes a new state-of-the-art for post-training quantization on several networks and tasks. Without fine-tuning, we can quantize the weights of Resnet18 and Resnet50 to 4 bits while staying within an accuracy loss of 1%.

Markus Nagel, Rana Ali Amjad, Mart van Baalen, Christos Louizos, Tijmen Blankevoort• 2020

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity5.93
3785
Semantic segmentationADE20K (val)
mIoU33.49
3069
Language ModelingC4
Perplexity8.34
1688
Instance SegmentationCOCO 2017 (val)--
1275
Language ModelingWikiText
PPL20.93
740
Image ClassificationImageNet-1k (val)
Top-1 Accuracy75.84
708
Oriented Object DetectionDOTA v1.0 (test)--
395
Image ClassificationImageNet (val)--
300
Long-context Language UnderstandingLongBench
M-Avg11.55
294
Common Sense ReasoningBoolQ
Accuracy73.79
240
Showing 10 of 23 rows

Other info

Follow for update