Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning

About

Entropy-based deep reasoning has emerged as a promising direction for improving the reasoning capabilities of Large Language Models (LLMs), but existing methods often either increase response length indiscriminately or shorten responses at the cost of accuracy. To better balance this trade-off, we introduce Conditional Entropy Shaping (CES), a framework that dynamically controls token-level response entropy, enabling LLMs to produce concise solutions on simple problems while encouraging deeper exploration on hard ones. Built on DAPO, CES uses token-level entropy as an uncertainty signal and applies a conditional bidirectional policy: it penalizes high-entropy "forking point" tokens on correct reasoning paths to improve conciseness, and rewards them on incorrect paths to encourage exploration and error correction. We implement CES on DeepSeek-R1-Distill-7B and evaluate it on 12 mathematical benchmarks. CES consistently improves average accuracy while reducing response length relative to DAPO, and supplementary experiments show similar trends on a smaller 1.5B backbone and on out-of-domain benchmarks.

Shuyu Wei, Jian Sun, Delai Qiu, Yining Wang, Shengping Liu, Jiaen Liang, Ying Fu, Wei Huang, Jitao Sang• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	AIME 24	Accuracy13.3	358
Reasoning	ARC	Accuracy83.6	269
Mathematical Reasoning	Olympiad Bench	Accuracy32	254
Mathematical Reasoning	AMC23	PASS@1 Accuracy55	216
Reasoning	OpenBookQA	Accuracy74.8	92
Mathematical Reasoning	GaoKao En 2023	Pass@1 Accuracy79.9	66
Mathematical Reasoning	CMath	Accuracy91.9	63
Mathematical Reasoning	College Math	Accuracy41.6	59
Mathematical Reasoning	SVAMP	Accuracy86.4	10
Mathematical Reasoning	GaoKao Math Cloze	Accuracy78.4	6

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord