Nice Fold or Hero Call: Learning Budget-Efficient Thinking for Adaptive Reasoning

About

Large reasoning models (LRMs) improve problem solving through extended reasoning, but often misallocate test-time compute. Existing efficiency methods reduce cost by compressing reasoning traces or conditioning budget on perceived difficulty, yet largely overlook solvability. As a result, they may spend large budgets on queries beyond the model's capability while compressing hard-but-solvable queries that require deeper reasoning. In this work, we formulate adaptive reasoning as a computational investment under uncertainty, where budget should follow the expected return of reasoning rather than perceived difficulty alone. To instantiate this principle, we propose Budget-Efficient Thinking (BET), a two-stage framework that combines behavioral cold-start with GRPO under an investment-cost-aware reward. By aligning solve-or-fold decisions with rollout-derived solvability, BET learns three behaviors: (1) short solve, answering easy queries concisely; (2) nice fold, abstaining early when continued reasoning has near-zero expected return; and (3) hero call, preserving sufficient compute for hard-but-solvable queries. Across seven benchmarks and three base models, BET reduces reasoning tokens by ~55% on average while achieving overall performance improvements, and transfers zero-shot from mathematical reasoning to scientific QA and logical reasoning with comparable efficiency gains.

Zhaomeng Zhou, Lan Zhang, Junyang Wang, Mu Yuan, Junda Lin• 2026

Related benchmarks

Task	Dataset	Result
Out-of-Domain Reasoning Aggregation	OOD Average	Accuracy63.57	22
Logical reasoning	LSAT-AR	Accuracy74.35	22
Scientific Question Answering	GPQA Diamond	Accuracy (ACC)52.53	22
Multi-step Narrative Reasoning	MuSR	Accuracy63.84	22

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord