Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Rethinking Residual Errors in Compensation-based LLM Quantization

About

Methods based on weight compensation, which iteratively apply quantization and weight compensation to minimize the output error, have recently demonstrated remarkable success in quantizing Large Language Models (LLMs). The representative work, GPTQ, introduces several key techniques that make such iterative methods practical for LLMs with billions of parameters. GPTAQ extends this approach by introducing an asymmetric calibration process that aligns the output of each quantized layer with its full-precision counterpart, incorporating a residual error into the weight compensation framework. In this work, we revisit the formulation of the residual error. We identify a sub-optimal calibration objective in existing methods: during the intra-layer calibration process, they align the quantized output with the output from compensated weights, rather than the true output from the original full-precision model. Therefore, we redefine the objective to precisely align the quantized model's output with the original output of the full-precision model at each step. We then reveal that the residual error originates not only from the output difference of the preceding layer but also from the discrepancy between the compensated and original weights within each layer, which we name the 'compensation-aware error'. By inheriting the neuron decomposition technique from GPTAQ, we can efficiently incorporate this compensation-aware error into the weight update process. Extensive experiments on various LLMs and quantization settings demonstrate that our proposed enhancements integrate seamlessly with both GPTQ and GPTAQ, significantly improving their quantization performance. Our code is publicly available at https://github.com/list0830/ResComp.

Shuaiting Li, Juncan Deng, Kedong Xu, Rongtao Deng, Hong Gu, Minghan Jiang, Haibin Shen, Kejie Huang• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity5.19
3785
Language ModelingWikiText-2 (test)
PPL7.3
2333
Language ModelingWikiText-2--
2320
Language ModelingC4
Perplexity7.28
1688
Image ClassificationImageNet
Top-1 Accuracy78.27
343
Language ModelingWikiText2
Perplexity5.03
277
Zero-shot Evaluation6 zero-shot downstream tasks
Average Accuracy71.09
70
Zero-shot Question Answering and Commonsense ReasoningZero-shot Downstream Tasks (ARC, HellaSwag, WinoGrande, BoolQ, PiQA)
Average Accuracy (Zero-Shot)77.7
48
Natural Language Understanding and ReasoningStandard Suite Zero-shot (PiQA, ARC-E, ARC-C, HellaSwag, WinoGrande, BoolQ)
PiQA Accuracy (Zero-shot)78.67
39
Zero-shot EvaluationDownstream Tasks PiQA ARC Hellaswag Winogrande BoolQ
PiQA Accuracy (Zero-shot)66.5
30
Showing 10 of 12 rows

Other info

Follow for update