Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Self-Consistency Improves Chain of Thought Reasoning in Language Models

About

Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%).

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou• 2022

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy51.23
1460
Visual Question AnsweringVQA v2
Accuracy66.04
1165
Visual Question AnsweringTextVQA
Accuracy54.85
1117
Mathematical ReasoningGSM8K
Accuracy91.8
983
Code GenerationHumanEval
Pass@187.58
850
Multi-task Language UnderstandingMMLU
Accuracy80.96
842
Mathematical ReasoningGSM8K (test)
Accuracy96
797
Commonsense ReasoningWinoGrande
Accuracy64.1
776
Language UnderstandingMMLU
Accuracy83.66
756
Mathematical ReasoningGSM8K (test)
Accuracy94.2
751
Showing 10 of 622 rows
...

Other info

Follow for update