Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization

About

In this paper, we present a simple optimization-based preprocessing technique called Weight Magnitude Reduction (MagR) to improve the performance of post-training quantization. For each linear layer, we adjust the pre-trained floating-point weights by solving an $\ell_\infty$-regularized optimization problem. This process greatly diminishes the maximum magnitude of the weights and smooths out outliers, while preserving the layer's output. The preprocessed weights are centered more towards zero, which facilitates the subsequent quantization process. To implement MagR, we address the $\ell_\infty$-regularization by employing an efficient proximal gradient descent algorithm. Unlike existing preprocessing methods that involve linear transformations and subsequent post-processing steps, which can introduce significant overhead at inference time, MagR functions as a non-linear transformation, eliminating the need for any additional post-processing. This ensures that MagR introduces no overhead whatsoever during inference. Our experiments demonstrate that MagR achieves state-of-the-art performance on the Llama family of models. For example, we achieve a Wikitext2 perplexity of 5.95 on the LLaMA2-70B model for per-channel INT2 weight quantization without incurring any inference overhead.

Aozhong Zhang, Naigang Wang, Yanxia Deng, Xin Li, Zi Yang, Penghang Yin• 2024

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity3.43
3785
Language ModelingWikiText-2
Perplexity (PPL)6.02
2320
Language ModelingC4 (val)
PPL13.83
737
Language ModelingWikiText2 (val)
Perplexity (PPL)13.16
423
Language GenerationWikiText2
Perplexity3.58
287
Language GenerationC4
Perplexity5.72
190
Language UnderstandingMMLU 5-shot
Accuracy66.1
153
Language ModelingC4
Perplexity7.59
72
Zero-shot Classification5 zero-shot tasks
Accuracy62.69
55
Zero-shot EvaluationZero-shot Evaluation Suite (ARC-c, ARC-e, HellaSwag, PIQA, WinoGrande)
ARC-c Accuracy39.59
52
Showing 10 of 19 rows

Other info

Follow for update