ESTAR: Early-Stopping Token-Aware Reasoning For Efficient Inference

About

Large reasoning models (LRMs) achieve state-of-the-art performance by generating long chains-of-thought, but often waste computation on redundant reasoning after the correct answer has already been reached. We introduce Early-Stopping for Token-Aware Reasoning (ESTAR), which detects and reduces such reasoning redundancy to improve efficiency without sacrificing accuracy. Our method combines (i) a trajectory-based classifier that identifies when reasoning can be safely stopped, (ii) supervised fine-tuning to teach LRMs to propose self-generated <stop> signals, and (iii) <stop>-aware reinforcement learning that truncates rollouts at self-generated stop points with compute-aware rewards. Experiments on four reasoning datasets show that ESTAR reduces reasoning length by about 3.7x (from 4,799 to 1,290) while preserving accuracy (74.9% vs. 74.2%), with strong cross-domain generalization. These results highlight early stopping as a simple yet powerful mechanism for improving reasoning efficiency in LRMs.

Junda Wang, Zhichao Yang, Dongxu Zhang, Sanjit Singh Batra, Robert E. Tillman• 2026

Related benchmarks

Task	Dataset	Result
Question Answering	MedQA-USMLE (test)	Accuracy77.13	101
Closed Question Answering	JAMA (test)	Accuracy57.2	9
Open Question Answering	AIME 2025 (test)	Accuracy70	9
Open Question Answering	MATH500 (test)	Accuracy0.938	9

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord