Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ESTAR: Early-Stopping Token-Aware Reasoning For Efficient Inference

About

Large reasoning models (LRMs) achieve state-of-the-art performance by generating long chains-of-thought, but often waste computation on redundant reasoning after the correct answer has already been reached. We introduce Early-Stopping for Token-Aware Reasoning (ESTAR), which detects and reduces such reasoning redundancy to improve efficiency without sacrificing accuracy. Our method combines (i) a trajectory-based classifier that identifies when reasoning can be safely stopped, (ii) supervised fine-tuning to teach LRMs to propose self-generated <stop> signals, and (iii) <stop>-aware reinforcement learning that truncates rollouts at self-generated stop points with compute-aware rewards. Experiments on four reasoning datasets show that ESTAR reduces reasoning length by about 3.7x (from 4,799 to 1,290) while preserving accuracy (74.9% vs. 74.2%), with strong cross-domain generalization. These results highlight early stopping as a simple yet powerful mechanism for improving reasoning efficiency in LRMs.

Junda Wang, Zhichao Yang, Dongxu Zhang, Sanjit Singh Batra, Robert E. Tillman• 2026

Related benchmarks

TaskDatasetResultRank
Question AnsweringMedQA-USMLE (test)
Accuracy77.13
101
Closed Question AnsweringJAMA (test)
Accuracy57.2
9
Open Question AnsweringAIME 2025 (test)
Accuracy70
9
Open Question AnsweringMATH500 (test)
Accuracy0.938
9
Showing 4 of 4 rows

Other info

Follow for update