Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Rethinking Chain-of-Thought from the Perspective of Self-Training

About

Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent capabilities in LLMs. Interestingly, we observe that both CoT reasoning and self-training share the core objective: iteratively leveraging model-generated information to progressively reduce prediction uncertainty. Building on this insight, we propose a novel CoT framework to improve reasoning performance. Our framework integrates two key components: (i) a task-specific prompt module that optimizes the initial reasoning process, and (ii) an adaptive reasoning iteration module that dynamically refines the reasoning process and addresses the limitations of previous CoT approaches, \ie over-reasoning and high similarity between consecutive reasoning iterations. Extensive experiments demonstrate that the proposed method achieves significant advantages in both performance and computational efficiency.

Zongqian Wu, Baoduo Xu, Ruochen Cui, Mengmeng Zhan, Xiaofeng Zhu, Lei Feng• 2024

Related benchmarks

TaskDatasetResultRank
ReasoningGSM8K
Accuracy0.87
106
Symbolic ReasoningLetter
Accuracy72.67
67
Symbolic ReasoningLast Letter Concatenation
Accuracy72.67
58
Algorithmic ReasoningMATH
Accuracy72.2
46
ReasoningBamboogle
Accuracy59
46
Mathematical ReasoningGSM-Hard
Accuracy39.8
46
Symbolic ReasoningCOIN
Accuracy75.5
45
ReasoningStrategyQA
Accuracy68.75
40
Domain-specific ReasoningLegalBench
Accuracy55.79
33
Mathematical ReasoningGSM-Hard
Accuracy46.2
28
Showing 10 of 16 rows

Other info

Follow for update