Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking

About

Large Reasoning Models (LRMs) have revolutionized complex problem-solving, yet they exhibit a pervasive "overthinking", generating unnecessarily long reasoning chains. While current solutions improve token efficiency, they often sacrifice fine-grained control or risk disrupting the logical integrity of the reasoning process. To address this, we introduce Stepwise Adaptive Thinking (SAT), a framework that performs step-level, difficulty-aware pruning while preserving the core reasoning structure. SAT formulates reasoning as a Finite-State Machine (FSM) with distinct thinking modes (Slow, Normal, Fast, Skip). It navigates these states dynamically using a lightweight Process Reward Model (PRM), compressing easy steps while preserving depth for hard ones. Experiments across 9 LRMs and 7 benchmarks show that SAT achieves up to 40% reduction in reasoning tokens while generally maintaining or improving accuracy.

Weiyang Huang, Xuefeng Bai, Kehai Chen, Xinyang Chen, Yibin Chen, Weili Guan, Min Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAMC
Accuracy (ACC)100
203
Mathematical ReasoningAIME 2024
Accuracy73.3
151
Mathematical ReasoningGSM8K
Accuracy96.6
60
Mathematical ReasoningMATH 500
Accuracy (%)97
54
Mathematical ReasoningMATH 500
Accuracy97
36
Mathematical ReasoningAIME 2025
Accuracy (%)73.3
30
Showing 6 of 6 rows

Other info

Follow for update