LoopQ: Quantization for Recursive Transformers

About

Looped language models (LoopLMs) improve parameter efficiency by recursively reusing Transformer blocks, enabling deeper computation under a fixed model size. However, this reuse makes LoopLMs more fragile under post-training quantization (PTQ). We present the first systematic study of quantization in LoopLMs and identify three challenges: distribution shift across roles, state reuse across loop transitions, and recursive error accumulation. To address these challenges, we propose LoopQ, a loop-aware PTQ framework that preserves a shared quantized backbone while introducing lightweight adaptations. LoopQ combines activation scaling, selective transformation, cross-loop state alignment, and trajectory-aware optimization to reduce distributional mismatch within loops and error accumulation across loops. Experiments across seven benchmarks show that, under W4A4 quantization, LoopQ improves average downstream accuracy by 68.8% and reduces average perplexity by 87.7% compared with the strongest static PTQ baseline.

Rui Fang, Hsi-Wen Chen, Ming-Syan Chen• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2	Perplexity (PPL)11.38	2862
Commonsense Reasoning	WinoGrande	Accuracy67.8	1581
Multitask Language Understanding	MMLU	Accuracy70.07	263
Word Prediction	LAMBADA	Accuracy68.37	222
Question Answering	ARC	Accuracy58.19	45
Zero-shot Prediction	HellaSwag	Zero-shot HellaSwag Accuracy75.55	43

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord