Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

About

Improving the reasoning abilities of Large Language Models (LLMs), especially under parameter constraints, is crucial for real-world applications. Looped transformers address this by performing multiple latent iterations to refine each token beyond a single forward pass. However, we identify a latent overthinking phenomenon: most token predictions are already correct after the first pass, but are sometimes revised into errors in later iterations. We ask whether selectively skipping latent iterations can improve accuracy, and reveal significant potential with an oracle iteration policy that boosts performance by up to 7.3%. Motivated by this, we propose Think-at-Hard (TaH), a looped transformer optimized for selective iteration. TaH employs a lightweight neural decider to trigger latent iteration, only at tokens likely to be incorrect after the standard forward pass. During latent iterations, depth-aware Low-Rank Adaptation (LoRA) modules shift the objective from general next-token prediction to focused hard-token refinement. A duo-causal attention mechanism extends attention from the token sequence dimension to an additional iteration depth dimension, enabling cross-iteration information flow with full sequential parallelism. Experiments on nine benchmarks show consistent gains across math, QA, and coding tasks. With identical parameter counts, TaH outperforms always-iterate baselines by 3.8-4.4% while skipping iterations on 93% of tokens, and exceeds single-iteration Qwen3 baselines by 3.0-3.8%. When allowing <3% more parameters from LoRA and decider, the gains further increase to 5.3-6.2% and 6.1-6.8%, respectively. Our code is available at https://github.com/thu-nics/TaH.

Tianyu Fu, Yichen You, Zekai Chen, Guohao Dai, Huazhong Yang, Yu Wang• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH500 (test)	--	922
Mathematical Reasoning	MATH 500	Accuracy60.9	589
Code Generation	MBPP+	Accuracy68.1	243
General Reasoning	MMLU-Pro	Accuracy21.9	213
Math Reasoning	MATH500	Accuracy85.8	127
Mathematical Problem Solving	MATH500	Accuracy85.6	96
Mathematical Problem Solving	AIME 25	Accuracy30.4	84
Math Reasoning	OlympiadBench	Accuracy52.6	76
Science Reasoning	GPQA	Accuracy (GPQA)49	72
Code Reasoning	HumanEval	HumanEval Score43.6	70

Showing 10 of 18 rows

Other info

GitHub

Follow for update

@wizwand_team Discord