SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking
About
Large Reasoning Models (LRMs) have revolutionized complex problem-solving, yet they exhibit a pervasive "overthinking", generating unnecessarily long reasoning chains. While current solutions improve token efficiency, they often sacrifice fine-grained control or risk disrupting the logical integrity of the reasoning process. To address this, we introduce Stepwise Adaptive Thinking (SAT), a framework that performs step-level, difficulty-aware pruning while preserving the core reasoning structure. SAT formulates reasoning as a Finite-State Machine (FSM) with distinct thinking modes (Slow, Normal, Fast, Skip). It navigates these states dynamically using a lightweight Process Reward Model (PRM), compressing easy steps while preserving depth for hard ones. Experiments across 9 LRMs and 7 benchmarks show that SAT achieves up to 40% reduction in reasoning tokens while generally maintaining or improving accuracy.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | AMC | Accuracy (ACC)100 | 203 | |
| Mathematical Reasoning | AIME 2024 | Accuracy73.3 | 151 | |
| Mathematical Reasoning | GSM8K | Accuracy96.6 | 60 | |
| Mathematical Reasoning | MATH 500 | Accuracy (%)97 | 54 | |
| Mathematical Reasoning | MATH 500 | Accuracy97 | 36 | |
| Mathematical Reasoning | AIME 2025 | Accuracy (%)73.3 | 30 |