Automatic Chain of Thought Prompting in Large Language Models

About

Large language models (LLMs) can perform complex reasoning by generating intermediate reasoning steps. Providing these steps for prompting demonstrations is called chain-of-thought (CoT) prompting. CoT prompting has two major paradigms. One leverages a simple prompt like "Let's think step by step" to facilitate step-by-step thinking before answering a question. The other uses a few manual demonstrations one by one, each composed of a question and a reasoning chain that leads to an answer. The superior performance of the second paradigm hinges on the hand-crafting of task-specific demonstrations one by one. We show that such manual efforts may be eliminated by leveraging LLMs with the "Let's think step by step" prompt to generate reasoning chains for demonstrations one by one, i.e., let's think not just step by step, but also one by one. However, these generated chains often come with mistakes. To mitigate the effect of such mistakes, we find that diversity matters for automatically constructing demonstrations. We propose an automatic CoT prompting method: Auto-CoT. It samples questions with diversity and generates reasoning chains to construct demonstrations. On ten public benchmark reasoning tasks with GPT-3, Auto-CoT consistently matches or exceeds the performance of the CoT paradigm that requires manual designs of demonstrations. Code is available at https://github.com/amazon-research/auto-cot

Zhuosheng Zhang, Aston Zhang, Mu Li, Alex Smola• 2022

Related benchmarks

Task	Dataset	Result
Visual Question Answering	TextVQA	Accuracy63.64	1453
Visual Question Answering	VQA v2	Accuracy66.79	1429
Mathematical Reasoning	GSM8K	Accuracy71.4	1398
Multi-hop Question Answering	2WikiMultihopQA	--	559
Visual Question Answering	ScienceQA	Accuracy74.09	446
Mathematical Reasoning	MATH (test)	Overall Accuracy72.78	433
Commonsense Reasoning	CSQA	Accuracy79.4	366
Multi-hop Question Answering	HotpotQA (test)	--	311
Arithmetic Reasoning	MultiArith	Accuracy99	293
Arithmetic Reasoning	GSM8K	Accuracy80.2	272

Showing 10 of 72 rows

...

Other info

Code

Follow for update

@wizwand_team Discord