Nice Fold or Hero Call: Learning Budget-Efficient Thinking for Adaptive Reasoning
About
Large reasoning models (LRMs) improve problem solving through extended reasoning, but often misallocate test-time compute. Existing efficiency methods reduce cost by compressing reasoning traces or conditioning budget on perceived difficulty, yet largely overlook solvability. As a result, they may spend large budgets on queries beyond the model's capability while compressing hard-but-solvable queries that require deeper reasoning. In this work, we formulate adaptive reasoning as a computational investment under uncertainty, where budget should follow the expected return of reasoning rather than perceived difficulty alone. To instantiate this principle, we propose Budget-Efficient Thinking (BET), a two-stage framework that combines behavioral cold-start with GRPO under an investment-cost-aware reward. By aligning solve-or-fold decisions with rollout-derived solvability, BET learns three behaviors: (1) short solve, answering easy queries concisely; (2) nice fold, abstaining early when continued reasoning has near-zero expected return; and (3) hero call, preserving sufficient compute for hard-but-solvable queries. Across seven benchmarks and three base models, BET reduces reasoning tokens by ~55% on average while achieving overall performance improvements, and transfers zero-shot from mathematical reasoning to scientific QA and logical reasoning with comparable efficiency gains.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Out-of-Domain Reasoning Aggregation | OOD Average | Accuracy63.57 | 22 | |
| Logical reasoning | LSAT-AR | Accuracy74.35 | 22 | |
| Scientific Question Answering | GPQA Diamond | Accuracy (ACC)52.53 | 22 | |
| Multi-step Narrative Reasoning | MuSR | Accuracy63.84 | 22 |