Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning
About
Self-consistency (SC) is a popular technique for improving the reasoning accuracy of large language models by aggregating multiple sampled outputs, but it comes at a high computational cost due to extensive sampling. We introduce a hybrid ensembling approach that leverages the complementary strengths of two distinct modes of reasoning: Chain-of-Thought (CoT) and Program-of-Thought (PoT). We describe a general framework for combining these two forms of reasoning in self-consistency, as well as particular strategies for both full sampling and early-stopping. We show that CoT-PoT ensembling not only improves overall accuracy, but also drastically reduces the number of samples required for SC by a factor of 9.3x. In particular, the majority of tasks (78.6%) can be addressed with only two samples, which has not been possible with any prior SC methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K | Accuracy (Acc)96.1 | 337 | |
| Mathematical Reasoning | TabMWP | Accuracy88.4 | 203 | |
| Financial Reasoning | FinQA | Accuracy72.1 | 69 | |
| Mathematical Reasoning | GSM8K | Accuracy96.2 | 7 | |
| Mathematical Reasoning | MATH | Accuracy76.1 | 7 | |
| Mathematical Reasoning | SVAMP | Accuracy95.6 | 7 | |
| Mathematical Reasoning | FinQA | Accuracy72.2 | 7 | |
| Table-based Mathematical Reasoning | TabMWP | Accuracy88.4 | 7 |