Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Automatic Chain of Thought Prompting in Large Language Models

About

Large language models (LLMs) can perform complex reasoning by generating intermediate reasoning steps. Providing these steps for prompting demonstrations is called chain-of-thought (CoT) prompting. CoT prompting has two major paradigms. One leverages a simple prompt like "Let's think step by step" to facilitate step-by-step thinking before answering a question. The other uses a few manual demonstrations one by one, each composed of a question and a reasoning chain that leads to an answer. The superior performance of the second paradigm hinges on the hand-crafting of task-specific demonstrations one by one. We show that such manual efforts may be eliminated by leveraging LLMs with the "Let's think step by step" prompt to generate reasoning chains for demonstrations one by one, i.e., let's think not just step by step, but also one by one. However, these generated chains often come with mistakes. To mitigate the effect of such mistakes, we find that diversity matters for automatically constructing demonstrations. We propose an automatic CoT prompting method: Auto-CoT. It samples questions with diversity and generates reasoning chains to construct demonstrations. On ten public benchmark reasoning tasks with GPT-3, Auto-CoT consistently matches or exceeds the performance of the CoT paradigm that requires manual designs of demonstrations. Code is available at https://github.com/amazon-research/auto-cot

Zhuosheng Zhang, Aston Zhang, Mu Li, Alex Smola• 2022

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVQA v2
Accuracy66.79
1165
Visual Question AnsweringTextVQA
Accuracy63.64
1117
Mathematical ReasoningGSM8K
Accuracy71.4
983
Mathematical ReasoningMATH (test)
Overall Accuracy72.78
433
Commonsense ReasoningCSQA
Accuracy79.4
366
Multi-hop Question Answering2WikiMultihopQA--
278
Visual Question AnsweringOK-VQA
Accuracy48.13
224
Sentiment ClassificationSST2 (test)
Accuracy88.65
214
Visual Question AnsweringScienceQA
Accuracy74.09
210
Multi-hop Question AnsweringHotpotQA (test)--
198
Showing 10 of 60 rows

Other info

Code

Follow for update