Rethinking Chain-of-Thought from the Perspective of Self-Training

About

Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent capabilities in LLMs. Interestingly, we observe that both CoT reasoning and self-training share the core objective: iteratively leveraging model-generated information to progressively reduce prediction uncertainty. Building on this insight, we propose a novel CoT framework to improve reasoning performance. Our framework integrates two key components: (i) a task-specific prompt module that optimizes the initial reasoning process, and (ii) an adaptive reasoning iteration module that dynamically refines the reasoning process and addresses the limitations of previous CoT approaches, \ie over-reasoning and high similarity between consecutive reasoning iterations. Extensive experiments demonstrate that the proposed method achieves significant advantages in both performance and computational efficiency.

Zongqian Wu, Baoduo Xu, Ruochen Cui, Mengmeng Zhan, Xiaofeng Zhu, Lei Feng• 2024

Related benchmarks

Task	Dataset	Result
Reasoning	GSM8K	Accuracy0.87	111
Symbolic Reasoning	Last Letter Concatenation	Accuracy72.67	68
Symbolic Reasoning	Letter	Accuracy72.67	67
Reasoning	StrategyQA	Accuracy68.75	52
Algorithmic Reasoning	MATH	Accuracy72.2	46
Reasoning	Bamboogle	Accuracy59	46
Mathematical Reasoning	GSM-Hard	Accuracy39.8	46
Symbolic Reasoning	COIN	Accuracy75.5	45
Domain-specific Reasoning	LegalBench	Accuracy55.79	33
Mathematical Reasoning	GSM-Hard	Accuracy46.2	31

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord