Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Self-Consistency Improves Chain of Thought Reasoning in Language Models

About

Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%).

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou• 2022

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy51.23
1896
Visual Question AnsweringTextVQA
Accuracy54.85
1453
Commonsense ReasoningWinoGrande
Accuracy64.1
1442
Visual Question AnsweringVQA v2
Accuracy66.04
1429
Mathematical ReasoningGSM8K
Accuracy91.8
1398
Code GenerationHumanEval
Pass@187.58
1043
Text-based Visual Question AnsweringTextVQA--
962
Mathematical ReasoningGSM8K (test)
Accuracy96
954
Question AnsweringARC Challenge--
906
Mathematical ReasoningMATH500 (test)
Accuracy66
895
Showing 10 of 1215 rows
...

Other info

Follow for update