Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs
About
A popular approach for improving the correctness of output from large language models (LLMs) is Self-Consistency - poll the LLM multiple times and output the most frequent solution. Existing Self-Consistency techniques always generate a constant number of samples per question, where a better approach will be to non-uniformly distribute the available budget based on the amount of agreement in the samples generated so far. In response, we introduce Adaptive-Consistency, a cost-efficient, model-agnostic technique that dynamically adjusts the number of samples per question using a lightweight stopping criterion. Our experiments over 17 reasoning and code generation datasets and three LLMs demonstrate that Adaptive-Consistency reduces sample budget by up to 7.9 times with an average accuracy drop of less than 0.1%. Our code and data are available at https://www.sample-step-by-step.info
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K | Accuracy97.04 | 351 | |
| Reasoning | GPQA Diamond | Accuracy45.69 | 88 | |
| Mathematical Reasoning | HMMT25 | Accuracy48.8 | 78 | |
| Mathematical Reasoning | Omni-MATH | Accuracy43 | 68 | |
| Reasoning | AIME 25 | Accuracy76.7 | 40 | |
| General Knowledge Reasoning | MMLU-Pro | Accuracy75.72 | 31 | |
| Mathematical Reasoning | MATH500 | Acc83.8 | 30 | |
| Science Question Answering | GPQA | Memory Ratio0.21 | 24 | |
| Mathematical Reasoning | AMC | C_mem (Ratio)0.14 | 24 | |
| Mathematical Reasoning | MATH500 | Memory Usage Ratio18 | 24 |