Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LAQuant: A Simple Overhead-free Large Reasoning Model Quantization by Layer-wise Lookahead Loss

About

Large reasoning models (LRMs) reach competition-level math and coding accuracy via long autoregressive decoding, making per-token decoding cost a primary deployment concern. Weight quantization is the standard tool for acceleration, but representative recipes -- including state-of-the-art end-to-end (E2E) QAT -- lose accuracy on long-decoding reasoning benchmarks despite preserving perplexity and short-decode accuracy. Through a systematic gradient-direction analysis, we identify two factors driving this gap: (i) KV-cache fidelity preservation under the QAT loss, which E2E supervision attenuates via the softmax Fisher metric; and (ii) Hessian-subspace alignment between calibration data and the deployment distribution. We propose LookAhead Quantization (LAQuant), a layer-wise weight-only QAT method that addresses both factors without online-transform overhead by combining reasoning-domain calibration with a one-layer lookahead loss whose implicit cross-layer co-adaptation preserves the next-layer residual stream. For Qwen3-4B under W3G128 quantization, LAQuant improves AIME25 Pass@1 over ParoQuant by 15.11pp (1.93pp over ParoQuant++ at matched calibration) while achieving a 3.42x decoding speedup over FP16 on RTX A6000, compared with ParoQuant's 3.01x.

Euntae Choi, Sumin Song, Sungjoo Yoo• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity5.8
3785
Language ModelingC4
Perplexity6.94
1565
Mathematical ReasoningAIME 25
Pass@1 Accuracy61.3
178
General ReasoningMMLU-Pro
pass@1 Accuracy66.86
93
Code GenerationLiveCodeBench
Pass@156.59
76
General ReasoningGeneral Reasoning Suite Average
Pass@168.62
63
ReasoningLSAT
Pass@185.24
48
ReasoningGPQA
Pass@157.23
45
Zero-shot Task EvaluationARC-C, ARC-E, BoolQ, and HellaSwag
Accuracy69.35
28
Multi-subject Knowledge ReasoningMMLU-Pro
Pass@171.52
28
Showing 10 of 23 rows

Other info

Follow for update