Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

About

Entropy-based deep reasoning has emerged as a promising direction for improving the reasoning capabilities of Large Language Models (LLMs), but existing methods often either increase response length indiscriminately or shorten responses at the cost of accuracy. To better balance this trade-off, we introduce Conditional Entropy Shaping (CES), a framework that dynamically controls token-level response entropy, enabling LLMs to produce concise solutions on simple problems while encouraging deeper exploration on hard ones. Built on DAPO, CES uses token-level entropy as an uncertainty signal and applies a conditional bidirectional policy: it penalizes high-entropy "forking point" tokens on correct reasoning paths to improve conciseness, and rewards them on incorrect paths to encourage exploration and error correction. We implement CES on DeepSeek-R1-Distill-7B and evaluate it on 12 mathematical benchmarks. CES consistently improves average accuracy while reducing response length relative to DAPO, and supplementary experiments show similar trends on a smaller 1.5B backbone and on out-of-domain benchmarks.

Shuyu Wei, Jian Sun, Delai Qiu, Yining Wang, Shengping Liu, Jiaen Liang, Ying Fu, Wei Huang, Jitao Sang• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 24
Accuracy13.3
318
ReasoningARC
Accuracy83.6
245
Mathematical ReasoningOlympiad Bench
Accuracy32
222
Mathematical ReasoningAMC23
PASS@1 Accuracy55
207
ReasoningOpenBookQA
Accuracy74.8
92
Mathematical ReasoningGaoKao En 2023
Pass@1 Accuracy79.9
66
Mathematical ReasoningCMath
Accuracy91.9
63
Mathematical ReasoningCollege Math
Accuracy41.6
59
Mathematical ReasoningSVAMP
Accuracy86.4
10
Mathematical ReasoningGaoKao Math Cloze
Accuracy78.4
6
Showing 10 of 21 rows

Other info

Follow for update