Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Training-Free Looped Transformers

About

We introduce training-free looped transformers, in which a lightweight inference-time wrapper loops a contiguous mid-stack block of layers of a frozen checkpoint without additional fine-tuning, continued training, or architectural changes. Unlike prior looped transformer methods that train with the looped structure end-to-end, we retrofit recurrence onto pretrained models at test time. We show that naive block reapplication usually degrades performance, highlighting the importance of the loop application strategy. Motivated by viewing a pre-norm transformer block as a forward Euler step on an ODE, we instead treat looping as a refinement of the same approximation, replacing one large update with smaller damped sub-steps. Across seven dense, sparse MoE, and MLA+MoE model families, our method improves Qwen3-4B-Instruct by +2.64 pp on MMLU-Pro, Qwen3-30B-A3B-Instruct by +1.14 pp on CommonsenseQA, and Moonlight-16B-A3B-Instruct by +1.20 pp on OpenBookQA.

Lizhang Chen, Jonathan Li, Chen Liang, Ni Lao, Qiang Liu• 2026

Related benchmarks

TaskDatasetResultRank
Multiple-choice Question AnsweringHellaSwag
Accuracy77.93
196
Multiple-choice Question AnsweringSciQ
Accuracy95
91
Multiple-choice Question AnsweringMMLU zero-shot (test)
Accuracy (MMLU zero-shot)68.6
27
Multiple-choice Question AnsweringSuperGPQA MCQA
Accuracy31.7
21
Multiple-choice Question AnsweringCSQA
Accuracy79.85
9
Multiple-choice Question AnsweringARC Challenge 25-shot (test)
Accuracy58.79
4
Multiple-choice Question AnsweringOpenBookQA 0-shot (test)
Accuracy33.2
4
Multiple-choice Question AnsweringTruthfulQA--
4
Language ModelingLAMBADA
Perplexity (PPL)4.11
3
Multiple-choice Question AnsweringARC Easy
Accuracy79.38
3
Showing 10 of 13 rows

Other info

Follow for update