Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention

About

Recent progress in large language models (LLMs) has focused on test-time scaling to improve reasoning via increased inference computation, but often at the cost of efficiency. We revisit test-time behavior and uncover a simple yet underexplored phenomenon: reasoning uncertainty is highly localized-only a small subset of high-entropy tokens dominantly affects output correctness. Motivated by this, we propose Minimal Test-Time Intervention (MTI), a training-free framework that enhances reasoning accuracy and stability with minimal overhead. MTI includes: (i) Selective CFG intervention, applying classifier-free guidance only at uncertain positions; and (ii) Lightweight negative-prompt guidance, reusing the main model's KV cache to approximate unconditional decoding efficiently. MTI yields consistent gains across general, coding, and STEM tasks-e.g., +9.28% average improvement on six benchmarks for DeepSeek-R1-7B and +11.25% on AIME2024 using Ling-mini-2.0-while remaining highly efficient.

Zhen Yang, Mingyang Zhang, Feng Chen, Ganggui Ding, Liang Hou, Xin Tao, Ying-Cong Chen• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningBRUMO25--
37
Mathematical ReasoningHMMT25
Avg@8 Score53.33
20
Mathematical ReasoningAIME 24
Mean@8 Accuracy81.67
9
Mathematical ReasoningAIME 25
Mean@8 Accuracy73.95
9
Showing 4 of 4 rows

Other info

Follow for update