Self-Consistency Improves Chain of Thought Reasoning in Language Models

About

Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%).

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou• 2022

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy51.23	1896
Visual Question Answering	TextVQA	Accuracy54.85	1453
Commonsense Reasoning	WinoGrande	Accuracy64.1	1442
Visual Question Answering	VQA v2	Accuracy66.04	1429
Mathematical Reasoning	GSM8K	Accuracy91.8	1398
Code Generation	HumanEval	Pass@187.58	1043
Text-based Visual Question Answering	TextVQA	--	962
Mathematical Reasoning	GSM8K (test)	Accuracy96	954
Question Answering	ARC Challenge	--	906
Mathematical Reasoning	MATH500 (test)	Accuracy66	895

Showing 10 of 1215 rows

...

Other info

Follow for update

@wizwand_team Discord