Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Unveiling the Entropy Dynamics of Chain-of-Thought Reasoning

About

This paper investigates the entropy dynamics of Chain-of-Thought (CoT) and uncovers a consistent two-phase structure: an Uncertainty Region of exploration transitioning sharply to a Confidence Region of convergence. We demonstrate that the Confidence Region possesses two critical properties: 1) High Reliability -- answers in the confidence region become highly accurate and stable, and 2) High Redundancy -- models generate unnecessary tokens long after reaching the correct answer. These properties unlock more efficient and reliable inference strategies: 1) Early Exit leverages reliability and redundancy to terminate computation safely when returns diminish, and 2)Test-Time Scaling uses the Confidence Region signal to prioritize converged trajectories. To operationalize these insights, we formulate Confidence Region detection as a sequential change-point detection problem, being the first to apply classical change-point methods to monitor CoT reasoning. Using the Cumulative Sum (CUSUM) algorithm, a statistically optimal change-point detector, we develop a training-free framework for real-time inference control. Experiments show our approach establishes a superior Pareto-frontier for early exit. CUSUM achieves 63.06% accuracy with 11.1% token reduction, outperforming DEER and Dynasor by 3.28% and 4.36% in accuracy respectively. For test-time scaling, CUSUM-weighted voting consistently outperforms self-consistency.

Ting Xu, Xu He, Yupu Lu, Jiankai Sun, Dong Li, Wai Lam, Jianye Hao• 2026

Related benchmarks

TaskDatasetResultRank
Chain-of-Thought ReasoningAIME 25
Accuracy78.13
12
Chain-of-Thought ReasoningAIME 24
Accuracy77.71
12
Chain-of-Thought ReasoningGPQA Diamond
Accuracy67.68
12
Chain-of-Thought ReasoningAverage AIME25 AIME24 GPQA-Diamond
Accuracy73.3
12
Multi-hop Question AnsweringHotpotQA
EM34.1
10
Code execution output predictionLiveCodeBench
Accuracy86.6
4
Mathematical Question AnsweringAMC23
Accuracy89.36
2
Showing 7 of 7 rows

Other info

Follow for update