Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning

About

Self-consistency (SC) has been a widely used decoding strategy for chain-of-thought reasoning. Despite bringing significant performance improvements across a variety of multi-step reasoning tasks, it is a high-cost method that requires multiple sampling with the preset size. In this paper, we propose a simple and scalable sampling process, \textbf{E}arly-Stopping \textbf{S}elf-\textbf{C}onsistency (ESC), to greatly reduce the cost of SC without sacrificing performance. On this basis, one control scheme for ESC is further derivated to dynamically choose the performance-cost balance for different tasks and models. To demonstrate ESC's effectiveness, we conducted extensive experiments on three popular categories of reasoning tasks: arithmetic, commonsense and symbolic reasoning over language models with varying scales. The empirical results show that ESC reduces the average number of sampling of chain-of-thought reasoning by a significant margin on six benchmarks, including MATH (-33.8%), GSM8K (-80.1%), StrategyQA (-76.8%), CommonsenseQA (-78.5%), Coin Flip (-84.2%) and Last Letters (-67.4%), while attaining comparable performances.

Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, Kan Li• 2024

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy97.04
499
Mathematical ReasoningMathQA
Accuracy84.7
354
ReasoningGPQA Diamond
Accuracy45.69
185
Multimodal ReasoningLogicVista
Accuracy54.6
147
Mathematical ReasoningOmni-MATH
Accuracy42.8
123
Mathematical ReasoningHMMT25
Accuracy48.9
119
High-resolution Visual UnderstandingHR-Bench-8K
FSP93
83
Financial ReasoningFinQA
Accuracy70.4
69
Visual ReasoningV*Bench
Accuracy87
62
Mathematical ReasoningMathVision (test)
Accuracy22.7
53
Showing 10 of 62 rows

Other info

Follow for update