Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers

About

Hybrid reasoning language models are commonly controlled through high-level Think/No-think instructions to regulate reasoning behavior, yet we found that such mode switching is largely driven by a small set of trigger tokens rather than the instructions themselves. Through attention analysis and controlled prompting experiments, we show that a leading ``Okay'' token induces reasoning behavior, while the newline pattern following ``</think>'' suppresses it. Based on this observation, we propose Mid-Think, a simple training-free prompting format that combines these triggers to achieve intermediate-budget reasoning, consistently outperforming fixed-token and prompt-based baselines in terms of the accuracy-length trade-off. Furthermore, applying Mid-Think to RL training after SFT reduces training time by approximately 15% while improving final performance of Qwen3-8B on AIME from 69.8% to 72.4% and on GPQA from 58.5% to 61.1%, demonstrating its effectiveness for both inference-time control and RL-based reasoning training.

Wang Yang, Debargha Ganguly, Xinpeng Li, Chaoda Song, Shouren Wang, Vikash Singh, Vipin Chaudhary, Xiaotian Han• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH500 (test)	Accuracy94.1	895
Mathematical Reasoning	AIME No-think (test)	Accuracy62.9	8
Mathematical Reasoning	AIME Think (test)	Accuracy72.4	8
Question Answering	GPQA No-think (test)	Accuracy47.8	8
Question Answering	GPQA Think (test)	Accuracy61.1	8

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord