LAQuant: A Simple Overhead-free Large Reasoning Model Quantization by Layer-wise Lookahead Loss
About
Large reasoning models (LRMs) reach competition-level math and coding accuracy via long autoregressive decoding, making per-token decoding cost a primary deployment concern. Weight quantization is the standard tool for acceleration, but representative recipes -- including state-of-the-art end-to-end (E2E) QAT -- lose accuracy on long-decoding reasoning benchmarks despite preserving perplexity and short-decode accuracy. Through a systematic gradient-direction analysis, we identify two factors driving this gap: (i) KV-cache fidelity preservation under the QAT loss, which E2E supervision attenuates via the softmax Fisher metric; and (ii) Hessian-subspace alignment between calibration data and the deployment distribution. We propose LookAhead Quantization (LAQuant), a layer-wise weight-only QAT method that addresses both factors without online-transform overhead by combining reasoning-domain calibration with a one-layer lookahead loss whose implicit cross-layer co-adaptation preserves the next-layer residual stream. For Qwen3-4B under W3G128 quantization, LAQuant improves AIME25 Pass@1 over ParoQuant by 15.11pp (1.93pp over ParoQuant++ at matched calibration) while achieving a 3.42x decoding speedup over FP16 on RTX A6000, compared with ParoQuant's 3.01x.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText2 | Perplexity5.8 | 3785 | |
| Language Modeling | C4 | Perplexity6.94 | 1565 | |
| Mathematical Reasoning | AIME 25 | Pass@1 Accuracy61.3 | 178 | |
| General Reasoning | MMLU-Pro | pass@1 Accuracy66.86 | 93 | |
| Code Generation | LiveCodeBench | Pass@156.59 | 76 | |
| General Reasoning | General Reasoning Suite Average | Pass@168.62 | 63 | |
| Reasoning | LSAT | Pass@185.24 | 48 | |
| Reasoning | GPQA | Pass@157.23 | 45 | |
| Zero-shot Task Evaluation | ARC-C, ARC-E, BoolQ, and HellaSwag | Accuracy69.35 | 28 | |
| Multi-subject Knowledge Reasoning | MMLU-Pro | Pass@171.52 | 28 |