Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ThinkBrake: Efficient Reasoning via Log-Probability Margin Guided Decoding

About

Large Reasoning Models (LRMs) allocate substantial inference-time compute to Chain-of-Thought (CoT) reasoning, improving performance on mathematics, scientific QA, and tool usage. However, this introduces overthinking: LRMs often reach a correct intermediate solution, continue reasoning, and overwrite it with an incorrect answer. We first demonstrate that oracle stopping--where we inject </think> at every sentence boundary and select the best stopping point in hindsight--improves average accuracy by 8% while reducing thinking tokens by 72%, exposing substantial overthinking. Motivated by this finding, we propose ThinkBrake, which monitors the log-probability margin between the top continuation token and </think> at sentence boundaries, stopping reasoning when this margin narrows. ThinkBrake requires no training and achieves favorable accuracy-efficiency trade-offs across math, scientific QA, and tool usage benchmarks, reducing thinking token usage by up to 30%. Furthermore, we provide theoretical analysis showing that ThinkBrake is equivalent to test-time realignment with a reward bonus for the </think> token.

Sangjun Song, Minjae Oh, Seungkyu Lee, Sungmin Jo, Yohan Jo• 2025

Related benchmarks

TaskDatasetResultRank
Science ReasoningARC-C
Accuracy97
58
Mathematical ReasoningAIME 2024
Accuracy77.9
54
Science ReasoningGPQA D
Accuracy72.7
52
Math ReasoningGSM8K
Accuracy96.5
49
Math ReasoningMATH 500
Accuracy99.2
36
Math and Science ReasoningAverage
Accuracy87.6
36
Math ReasoningAIME 2024
Accuracy86.7
36
Math ReasoningAIME 2025
Accuracy73.3
36
Mathematical ReasoningMATH500
Accuracy97.2
22
Scientific ReasoningGPQA D
Accuracy67.7
22
Showing 10 of 18 rows

Other info

Follow for update