Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LoopQ: Quantization for Recursive Transformers

About

Looped language models (LoopLMs) improve parameter efficiency by recursively reusing Transformer blocks, enabling deeper computation under a fixed model size. However, this reuse makes LoopLMs more fragile under post-training quantization (PTQ). We present the first systematic study of quantization in LoopLMs and identify three challenges: distribution shift across roles, state reuse across loop transitions, and recursive error accumulation. To address these challenges, we propose LoopQ, a loop-aware PTQ framework that preserves a shared quantized backbone while introducing lightweight adaptations. LoopQ combines activation scaling, selective transformation, cross-loop state alignment, and trajectory-aware optimization to reduce distributional mismatch within loops and error accumulation across loops. Experiments across seven benchmarks show that, under W4A4 quantization, LoopQ improves average downstream accuracy by 68.8% and reduces average perplexity by 87.7% compared with the strongest static PTQ baseline.

Rui Fang, Hsi-Wen Chen, Ming-Syan Chen• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2
Perplexity (PPL)11.38
2320
Commonsense ReasoningWinoGrande
Accuracy67.8
1442
Multitask Language UnderstandingMMLU
Accuracy70.07
263
Word PredictionLAMBADA
Accuracy68.37
192
Zero-shot PredictionHellaSwag
Zero-shot HellaSwag Accuracy75.55
43
Question AnsweringARC
Accuracy58.19
24
Showing 6 of 6 rows

Other info

Follow for update