Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning

About

Recently, large reasoning models demonstrate exceptional performance on various tasks. However, reasoning models always consume excessive tokens even for simple queries, leading to resource waste and prolonged user latency. To address this challenge, we propose SelfBudgeter - a self-adaptive reasoning strategy for efficient and controllable reasoning. Specifically, we first train the model to self-estimate the required reasoning budget based on the query. We then introduce budget-guided GRPO for reinforcement learning, which effectively maintains accuracy while reducing output length. Experimental results demonstrate that SelfBudgeter dynamically allocates budgets according to problem complexity, achieving an average response length compression of 61% on math reasoning tasks while maintaining accuracy. Furthermore, SelfBudgeter allows users to see how long generation will take and decide whether to continue or stop. Additionally, users can directly control the reasoning length by setting token budgets upfront.

Zheng Li, Qingxiu Dong, Jingyuan Ma, Di Zhang, Kai Jia, Zhifang Sui• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K--
499
Mathematical ReasoningAIME 2024
Accuracy25
479
Mathematical ReasoningMinerva Math
Accuracy53.4
233
Mathematical Problem SolvingMATH500
Accuracy86.87
83
Mathematical ReasoningMinerva
Pass@125.5
22
Mathematical ReasoningAIME24
Pass@113.42
18
Mathematical ReasoningMATH500
Pass@1 Rate53.47
18
Math problem solvingAIME 2025 (test)
Accuracy30
9
Math problem solvingGSM8K (test)
Accuracy90.3
9
Complex ReasoningSCoRE (test)
Accuracy16.26
5
Showing 10 of 11 rows

Other info

Follow for update