Chain-of-Thought Reasoning Without Prompting
About
In enhancing the reasoning capabilities of large language models (LLMs), prior research primarily focuses on specific prompting techniques such as few-shot or zero-shot chain-of-thought (CoT) prompting. These methods, while effective, often involve manually intensive prompt engineering. Our study takes a novel approach by asking: Can LLMs reason effectively without prompting? Our findings reveal that, intriguingly, CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the \textit{decoding} process. Rather than conventional greedy decoding, we investigate the top-$k$ alternative tokens, uncovering that CoT paths are frequently inherent in these sequences. This approach not only bypasses the confounders of prompting but also allows us to assess the LLMs' \textit{intrinsic} reasoning abilities. Moreover, we observe that the presence of a CoT in the decoding path correlates with a higher confidence in the model's decoded answer. This confidence metric effectively differentiates between CoT and non-CoT paths. Extensive empirical studies on various reasoning benchmarks show that the proposed CoT-decoding effectively elicits reasoning capabilities from language models, which were previously obscured by standard greedy decoding.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K | Accuracy (Acc)89.76 | 337 | |
| Knowledge Reasoning | MMLU-Pro | Accuracy58.81 | 120 | |
| Open-domain Question Answering | HotpotQA | Accuracy82.09 | 73 | |
| Question Answering | GSM8K | Accuracy36.3 | 36 | |
| Code Generation | APPS Intermediate | Pass Rate55.27 | 32 | |
| Code Generation | APPS Introductory | pass@132 | 25 | |
| Question Answering | Sports Understanding | Accuracy68.4 | 24 | |
| Question Answering | MultiArith | Accuracy72.3 | 24 | |
| Free Question Answering | Auto categorization context-free | BLEU Score8 | 24 | |
| Free Question Answering | SQuAD contextual v1.1 | BLEU5.8 | 24 |