Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Self-Consistency Improves Chain of Thought Reasoning in Language Models

About

Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%).

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou• 2022

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy51.23
1891
Mathematical ReasoningGSM8K
Accuracy91.8
1362
Visual Question AnsweringVQA v2
Accuracy66.04
1362
Visual Question AnsweringTextVQA
Accuracy54.85
1285
Commonsense ReasoningWinoGrande
Accuracy64.1
1085
Code GenerationHumanEval
Pass@187.58
1036
Question AnsweringARC Challenge--
906
Mathematical ReasoningGSM8K (test)
Accuracy96
900
Mathematical ReasoningMATH
Accuracy61.6
882
Multi-task Language UnderstandingMMLU
Accuracy80.96
876
Showing 10 of 859 rows
...

Other info

Follow for update