Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

About

Large Reasoning Models (LRMs) achieve strong performance by generating long chains of thought (CoT), but often overthink, continuing to reason after a solution has already stabilized and thereby wasting tokens and increasing latency. Existing inference-time early-exit methods rely primarily on answer-level signals, such as confidence or trial-answer consistency, to decide when to stop. However, these signals mainly reflect answer readiness rather than reasoning convergence: they may trigger before the model has finished exploring or self-correcting, causing premature exits that can degrade final-answer accuracy and leave the retained reasoning chain semantically incomplete. We identify reasoning-level semantic redundancy as a complementary signal for semantic-preserving early exit: when successive steps no longer add novel progress and instead revisit established conclusions, the reasoning trajectory has likely converged. Building on this insight, we propose PUMA, a plug-and-play framework that combines a lightweight Redundancy Detector with answer-level verification. The detector flags semantically redundant candidate exits, while verification confirms whether stopping is safe, allowing PUMA to remove redundant continuation while preserving both answer accuracy and a coherent reasoning prefix. Across five LRMs and five challenging reasoning benchmarks, PUMA achieves 26.2% average token reduction while preserving accuracy and retained CoT quality. Additional experiments on code generation, zero-shot vision-language reasoning, and learned stopping-policy internalization further demonstrate that reasoning-level redundancy is a robust, transferable, and learnable signal for efficient reasoning. Our code is available at \url{https://github.com/giovanni-vaccarino/PUMA}.

Dehai Min, Giovanni Vaccarino, Huiyi Chen, Yongliang Wu, Gal Yona, Lu Cheng• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	OlymBench	Accuracy72.3	54
Reasoning	Overall Combined Benchmarks	Accuracy82.5	31
Mathematical Reasoning	AIME 24	Accuracy90	27
Scientific Reasoning	GPQA D	Accuracy75.8	27
Mathematical Reasoning	AIME 25	Accuracy80	27
Mathematical Reasoning	MATH 500	Accuracy94.2	27
Mathematical Reasoning	Overall Aggregate	Accuracy68.7	12
Mathematical Reasoning	MATH 500	Accuracy92.6	12
Mathematical Reasoning	AIME 24	Accuracy70	12
Mathematical Reasoning	OlympiadBench	Accuracy62.2	12

Showing 10 of 12 rows

Other info

GitHub

Follow for update

@wizwand_team Discord