Training-Free Looped Transformers

About

We introduce training-free looped transformers, in which a lightweight inference-time wrapper loops a contiguous mid-stack block of layers of a frozen checkpoint without additional fine-tuning, continued training, or architectural changes. Unlike prior looped transformer methods that train with the looped structure end-to-end, we retrofit recurrence onto pretrained models at test time. We show that naive block reapplication usually degrades performance, highlighting the importance of the loop application strategy. Motivated by viewing a pre-norm transformer block as a forward Euler step on an ODE, we instead treat looping as a refinement of the same approximation, replacing one large update with smaller damped sub-steps. Across seven dense, sparse MoE, and MLA+MoE model families, our method improves Qwen3-4B-Instruct by +2.64 pp on MMLU-Pro, Qwen3-30B-A3B-Instruct by +1.14 pp on CommonsenseQA, and Moonlight-16B-A3B-Instruct by +1.20 pp on OpenBookQA.

Lizhang Chen, Jonathan Li, Chen Liang, Ni Lao, Qiang Liu• 2026

Related benchmarks

Task	Dataset	Result
Multiple-choice Question Answering	HellaSwag	Accuracy77.93	212
Multiple-choice Question Answering	SciQ	Accuracy95	91
Language Modeling	LAMBADA	Perplexity (PPL)4.11	27
Multiple-choice Question Answering	MMLU zero-shot (test)	Accuracy (MMLU zero-shot)68.6	27
Multiple-choice Question Answering	SuperGPQA MCQA	Accuracy31.7	21
Multiple-choice Question Answering	TruthfulQA	Accuracy (MC1 Delta)34.64	12
Multiple-choice Question Answering	CSQA	Accuracy79.85	9
Multiple-choice Question Answering	ARC Challenge 25-shot (test)	Accuracy58.79	4
Multiple-choice Question Answering	OpenBookQA 0-shot (test)	Accuracy33.2	4
Multiple-choice Question Answering	ARC Easy	Accuracy79.38	3

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord