Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference

About

Post-training quantization (PTQ) compresses the weights and activations of large language models (LLMs) into low-precision representations to reduce memory footprint and accelerate inference. However, the presence of outliers in weights and activations often leads to large quantization errors and severe accuracy degradation, especially in recent reasoning LLMs where errors accumulate across long chains of thought. Existing PTQ methods either fail to sufficiently suppress outliers or introduce significant overhead during inference. In this paper, we propose Pairwise Rotation Quantization (ParoQuant), a PTQ method that combines hardware-efficient and optimizable independent Givens rotations with channel-wise scaling to even out the magnitudes across channels and narrow the dynamic range within each quantization group, effectively addressing the outlier issue. We further co-design the inference kernel to fully exploit GPU parallelism and keep the rotations and scaling lightweight at runtime. Under weight-only quantization, ParoQuant achieves an average 2.4% accuracy improvement over AWQ on reasoning tasks, with less than 10% overhead. ParoQuant also matches the accuracy of state-of-the-art weight-activation quantization methods. This paves the way for more efficient and accurate deployment of reasoning LLMs.

Yesheng Liang, Haisheng Chen, Zihan Zhang, Song Han, Zhijian Liu• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2 (test)
PPL2.82
1541
Commonsense ReasoningHellaSwag
Accuracy60.7
1460
Language ModelingC4
Perplexity7.36
1182
Language ModelingWikiText-2
Perplexity (PPL)6.62
841
Language ModelingC4 (val)
PPL5.86
392
Question AnsweringBoolQ
Accuracy89.1
240
Multiple-choice Question AnsweringARC Easy
Accuracy84.3
122
ReasoningGPQA Diamond
Accuracy63.5
88
Language ModelingC4
C4 Loss7.44
73
ReasoningMMLU-Pro
Accuracy77.5
50
Showing 10 of 17 rows

Other info

Follow for update