Unveiling the Entropy Dynamics of Chain-of-Thought Reasoning
About
This paper investigates the entropy dynamics of Chain-of-Thought (CoT) and uncovers a consistent two-phase structure: an Uncertainty Region of exploration transitioning sharply to a Confidence Region of convergence. We demonstrate that the Confidence Region possesses two critical properties: 1) High Reliability -- answers in the confidence region become highly accurate and stable, and 2) High Redundancy -- models generate unnecessary tokens long after reaching the correct answer. These properties unlock more efficient and reliable inference strategies: 1) Early Exit leverages reliability and redundancy to terminate computation safely when returns diminish, and 2)Test-Time Scaling uses the Confidence Region signal to prioritize converged trajectories. To operationalize these insights, we formulate Confidence Region detection as a sequential change-point detection problem, being the first to apply classical change-point methods to monitor CoT reasoning. Using the Cumulative Sum (CUSUM) algorithm, a statistically optimal change-point detector, we develop a training-free framework for real-time inference control. Experiments show our approach establishes a superior Pareto-frontier for early exit. CUSUM achieves 63.06% accuracy with 11.1% token reduction, outperforming DEER and Dynasor by 3.28% and 4.36% in accuracy respectively. For test-time scaling, CUSUM-weighted voting consistently outperforms self-consistency.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Chain-of-Thought Reasoning | AIME 25 | Accuracy78.13 | 12 | |
| Chain-of-Thought Reasoning | AIME 24 | Accuracy77.71 | 12 | |
| Chain-of-Thought Reasoning | GPQA Diamond | Accuracy67.68 | 12 | |
| Chain-of-Thought Reasoning | Average AIME25 AIME24 GPQA-Diamond | Accuracy73.3 | 12 | |
| Multi-hop Question Answering | HotpotQA | EM34.1 | 10 | |
| Code execution output prediction | LiveCodeBench | Accuracy86.6 | 4 | |
| Mathematical Question Answering | AMC23 | Accuracy89.36 | 2 |