Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models

About

Recent advancements in slow thinking reasoning models have shown exceptional performance in complex reasoning tasks. However, these models often exhibit overthinking (generating redundant reasoning steps for simple problems), leading to excessive computational resource usage. While current mitigation strategies uniformly reduce reasoning tokens, they risk degrading performance on challenging tasks that require extended reasoning. This paper introduces Difficulty-Adaptive Slow Thinking (DAST), a novel framework that enables models to autonomously adjust the length of Chain-of-Thought (CoT) based on problem difficulty. We first propose a Token Length Budget (TLB) metric to quantify difficulty, then leverage budget-aware reward shaping and budget preference optimization to implement DAST. DAST penalizes overlong responses for simple tasks while incentivizing sufficient reasoning for complex problems. Experiments on diverse datasets and model scales demonstrate that DAST effectively mitigates overthinking (reducing token usage by over 30\% on average) while preserving reasoning accuracy on complex problems. Our codes and models are available at https://github.com/AnonymousUser0520/AnonymousRepo01.

Yi Shen, Jian Zhang, Jieyun Huang, Shuming Shi, Wenjing Zhang, Jiangze Yan, Ning Wang, Kai Wang, Zhaoxiang Liu, Shiguo Lian• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande--
1442
Mathematical ReasoningAIME 24
Accuracy50.83
318
Mathematical ReasoningMATH 500--
236
Mathematical ReasoningOlympiad Bench
Accuracy58.3
222
Mathematical ReasoningMATH 500
Accuracy92.4
221
Mathematical ReasoningOlympiadBench
Accuracy55.34
213
Mathematical ReasoningMinerva--
138
Mathematical ReasoningAIME24
Pass@1 Accuracy77.8
117
Mathematical ReasoningAMC 23
Pass@1 Accuracy87.66
109
Scientific ReasoningGPQA Diamond
Pass@1 Accuracy65.7
67
Showing 10 of 33 rows

Other info

Follow for update