Up or Down? Adaptive Rounding for Post-Training Quantization

About

When quantizing neural networks, assigning each floating-point weight to its nearest fixed-point value is the predominant approach. We find that, perhaps surprisingly, this is not the best we can do. In this paper, we propose AdaRound, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss. AdaRound is fast, does not require fine-tuning of the network, and only uses a small amount of unlabelled data. We start by theoretically analyzing the rounding problem for a pre-trained neural network. By approximating the task loss with a Taylor series expansion, the rounding task is posed as a quadratic unconstrained binary optimization problem. We simplify this to a layer-wise local loss and propose to optimize this loss with a soft relaxation. AdaRound not only outperforms rounding-to-nearest by a significant margin but also establishes a new state-of-the-art for post-training quantization on several networks and tasks. Without fine-tuning, we can quantize the weights of Resnet18 and Resnet50 to 4 bits while staying within an accuracy loss of 1%.

Markus Nagel, Rana Ali Amjad, Mart van Baalen, Christos Louizos, Tijmen Blankevoort• 2020

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText2	Perplexity5.93	3785
Semantic segmentation	ADE20K (val)	mIoU33.49	3069
Language Modeling	C4	Perplexity8.34	1688
Instance Segmentation	COCO 2017 (val)	--	1275
Language Modeling	WikiText	PPL20.93	740
Image Classification	ImageNet-1k (val)	Top-1 Accuracy75.84	708
Oriented Object Detection	DOTA v1.0 (test)	--	395
Image Classification	ImageNet (val)	--	300
Long-context Language Understanding	LongBench	M-Avg11.55	294
Common Sense Reasoning	BoolQ	Accuracy73.79	240

Showing 10 of 23 rows

Other info

Follow for update

@wizwand_team Discord