Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

About

Improving reasoning abilities of Large Language Models (LLMs), especially under parameter constraints, is crucial for real-world applications. Looped transformers address this by performing multiple latent iterations to refine each token beyond a single forward pass. However, we identify a latent overthinking phenomenon: most token predictions are already correct after the first pass, but are sometimes revised into errors in later iterations. In this work, we ask whether selectively skipping latent iterations may improve accuracy. We reveal significant potential with an oracle iteration policy that boosts model performance by up to 7.3%. Motivated by this, we propose Think-at-Hard (TaH), a looped transformer optimized for selective iteration. TaH employs a lightweight neural decider to trigger latent iteration only at tokens that are likely incorrect after the standard forward pass. During latent iterations, depth-aware Low-Rank Adaptation (LoRA) modules shift the LLM's objective from general next-token prediction to focused hard-token refinement. A duo-causal attention mechanism extends attention from the token sequence dimension to an additional iteration depth dimension, enabling cross-iteration information flow with full sequential parallelism. Experiments on nine benchmarks show consistent gains across math, QA, and coding tasks. With identical parameter counts, TaH outperforms always-iterate baselines by 3.8-4.4% while skipping iterations on 93% of tokens, and exceeds single-iteration Qwen3 baselines by 3.0-3.8%. When allowing <3% more parameters from LoRA and decider modules, the gains further increase to 5.3-6.2% and 6.1-6.8%, respectively. Our code is available at https://github.com/thu-nics/TaH.

Tianyu Fu, Yichen You, Zekai Chen, Guohao Dai, Huazhong Yang, Yu Wang• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH500 (test)--
895
Code GenerationMBPP+
Accuracy68.1
236
Mathematical ReasoningMATH 500
Accuracy60.9
221
General ReasoningMMLU-Pro
Accuracy21.9
201
Mathematical Problem SolvingMATH500
Accuracy85.6
83
Math ReasoningMATH500
Accuracy85.8
83
Math ReasoningOlympiadBench
Accuracy52.6
76
Science ReasoningGPQA
Accuracy (GPQA)49
72
Mathematical Problem SolvingAIME 25
Accuracy30.4
71
Code ReasoningHumanEval
HumanEval Score43.6
62
Showing 10 of 18 rows

Other info

GitHub

Follow for update