Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning

About

Self-consistency (SC) is a popular technique for improving the reasoning accuracy of large language models by aggregating multiple sampled outputs, but it comes at a high computational cost due to extensive sampling. We introduce a hybrid ensembling approach that leverages the complementary strengths of two distinct modes of reasoning: Chain-of-Thought (CoT) and Program-of-Thought (PoT). We describe a general framework for combining these two forms of reasoning in self-consistency, as well as particular strategies for both full sampling and early-stopping. We show that CoT-PoT ensembling not only improves overall accuracy, but also drastically reduces the number of samples required for SC by a factor of 9.3x. In particular, the majority of tasks (78.6%) can be addressed with only two samples, which has not been possible with any prior SC methods.

Raman Saparkhan, Majd Hawasly, Md Rizwan Parvez, Mohammad Raza• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy (Acc)96.1	352
Mathematical Reasoning	TabMWP	Accuracy88.4	210
Financial Reasoning	FinQA	Accuracy72.1	69
Mathematical Reasoning	GSM8K	Accuracy96.2	7
Mathematical Reasoning	MATH	Accuracy76.1	7
Mathematical Reasoning	SVAMP	Accuracy95.6	7
Mathematical Reasoning	FinQA	Accuracy72.2	7
Table-based Mathematical Reasoning	TabMWP	Accuracy88.4	7

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord