Token-Budget-Aware LLM Reasoning

About

Reasoning is critical for large language models (LLMs) to excel in a wide range of tasks. While methods like Chain-of-Thought (CoT) reasoning and enhance LLM performance by decomposing problems into intermediate steps, they also incur significant overhead in token usage, leading to increased costs. We find that the reasoning process of current LLMs is unnecessarily lengthy and it can be compressed by including a reasonable token budget in the prompt, but the choice of token budget plays a crucial role in the actual compression effectiveness. We then propose a token-budget-aware LLM reasoning framework that dynamically adjusts the number of reasoning tokens based on the reasoning complexity of each problem. Experiments show that our method effectively reduces token costs in CoT reasoning with only a slight performance reduction, offering a practical solution to balance efficiency and accuracy in LLM reasoning. Code: https://github.com/GeniusHTX/TALE

Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen• 2024

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH500 (test)	Accuracy91.8	895
Mathematical Reasoning	GSM8K	--	499
Mathematical Reasoning	AIME 2024	Accuracy71.1	220
Mathematical Reasoning	AMC	Accuracy (ACC)93.3	215
Mathematical Reasoning	AMC23	PASS@1 Accuracy77.5	207
Mathematical Reasoning	AIME 24	Accuracy71.1	113
Mathematical Reasoning	AMC 23	Accuracy94.1	113
Mathematical Reasoning	GSM8K (test)	Accuracy78.57	79
Mathematical Reasoning	MATH 500	Accuracy (%)94	54
Mathematical Reasoning	GSM8K (test)	Accuracy93.6	33

Showing 10 of 55 rows

Other info

Code

Follow for update

@wizwand_team Discord