ThinkBrake: Efficient Reasoning via Log-Probability Margin Guided Decoding

About

Large Reasoning Models (LRMs) allocate substantial inference-time compute to Chain-of-Thought (CoT) reasoning, improving performance on mathematics, scientific QA, and tool usage. However, this introduces overthinking: LRMs often reach a correct intermediate solution, continue reasoning, and overwrite it with an incorrect answer. We first demonstrate that oracle stopping--where we inject </think> at every sentence boundary and select the best stopping point in hindsight--improves average accuracy by 8% while reducing thinking tokens by 72%, exposing substantial overthinking. Motivated by this finding, we propose ThinkBrake, which monitors the log-probability margin between the top continuation token and </think> at sentence boundaries, stopping reasoning when this margin narrows. ThinkBrake requires no training and achieves favorable accuracy-efficiency trade-offs across math, scientific QA, and tool usage benchmarks, reducing thinking token usage by up to 30%. Furthermore, we provide theoretical analysis showing that ThinkBrake is equivalent to test-time realignment with a reward bonus for the </think> token.

Sangjun Song, Minjae Oh, Seungkyu Lee, Sungmin Jo, Yohan Jo• 2025

Related benchmarks

Task	Dataset	Result
Science Reasoning	ARC-C	Accuracy97	65
Mathematical Reasoning	AIME 2024	Accuracy77.9	54
Science Reasoning	GPQA D	Accuracy72.7	52
Math Reasoning	GSM8K	Accuracy96.5	49
Math Reasoning	MATH 500	Accuracy99.2	36
Math and Science Reasoning	Average	Accuracy87.6	36
Math Reasoning	AIME 2024	Accuracy86.7	36
Math Reasoning	AIME 2025	Accuracy73.3	36
Mathematical Reasoning	MATH500	Accuracy97.2	22
Scientific Reasoning	GPQA D	Accuracy67.7	22

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord