DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models

About

Recent advancements in slow thinking reasoning models have shown exceptional performance in complex reasoning tasks. However, these models often exhibit overthinking (generating redundant reasoning steps for simple problems), leading to excessive computational resource usage. While current mitigation strategies uniformly reduce reasoning tokens, they risk degrading performance on challenging tasks that require extended reasoning. This paper introduces Difficulty-Adaptive Slow Thinking (DAST), a novel framework that enables models to autonomously adjust the length of Chain-of-Thought (CoT) based on problem difficulty. We first propose a Token Length Budget (TLB) metric to quantify difficulty, then leverage budget-aware reward shaping and budget preference optimization to implement DAST. DAST penalizes overlong responses for simple tasks while incentivizing sufficient reasoning for complex problems. Experiments on diverse datasets and model scales demonstrate that DAST effectively mitigates overthinking (reducing token usage by over 30\% on average) while preserving reasoning accuracy on complex problems. Our codes and models are available at https://github.com/AnonymousUser0520/AnonymousRepo01.

Yi Shen, Jian Zhang, Jieyun Huang, Shuming Shi, Wenjing Zhang, Jiangze Yan, Ning Wang, Kai Wang, Zhaoxiang Liu, Shiguo Lian• 2025

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	WinoGrande	--	1581
Mathematical Reasoning	MATH 500	Accuracy92.4	589
Mathematical Reasoning	AIME 2024	Accuracy55.6	394
Mathematical Reasoning	AIME 24	Accuracy50.83	358
Mathematical Reasoning	AIME 2024 (test)	Accuracy75.6	294
Mathematical Reasoning	MATH 500	--	274
Mathematical Reasoning	Olympiad Bench	Accuracy58.3	254
Mathematical Reasoning	OlympiadBench	Accuracy55.34	213
Mathematical Reasoning	GSM8K	Accuracy86.7	192
Mathematical Reasoning	GSM8K	Accuracy94.8	166

Showing 10 of 45 rows

Other info

Follow for update

@wizwand_team Discord