LoopQ: Quantization for Recursive Transformers
About
Looped language models (LoopLMs) improve parameter efficiency by recursively reusing Transformer blocks, enabling deeper computation under a fixed model size. However, this reuse makes LoopLMs more fragile under post-training quantization (PTQ). We present the first systematic study of quantization in LoopLMs and identify three challenges: distribution shift across roles, state reuse across loop transitions, and recursive error accumulation. To address these challenges, we propose LoopQ, a loop-aware PTQ framework that preserves a shared quantized backbone while introducing lightweight adaptations. LoopQ combines activation scaling, selective transformation, cross-loop state alignment, and trajectory-aware optimization to reduce distributional mismatch within loops and error accumulation across loops. Experiments across seven benchmarks show that, under W4A4 quantization, LoopQ improves average downstream accuracy by 68.8% and reduces average perplexity by 87.7% compared with the strongest static PTQ baseline.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 | Perplexity (PPL)11.38 | 2320 | |
| Commonsense Reasoning | WinoGrande | Accuracy67.8 | 1442 | |
| Multitask Language Understanding | MMLU | Accuracy70.07 | 263 | |
| Word Prediction | LAMBADA | Accuracy68.37 | 192 | |
| Zero-shot Prediction | HellaSwag | Zero-shot HellaSwag Accuracy75.55 | 43 | |
| Question Answering | ARC | Accuracy58.19 | 24 |