Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
About
Large Language Models (LLMs) often struggle with computational efficiency and error propagation in multi-step reasoning tasks. While recent advancements on prompting and post-training have enabled LLMs to perform step-wise reasoning, they still tend to explore unproductive solution paths without effective backtracking or strategy adjustment. In this paper, we propose Meta-Reasoner, a new framework that empowers LLMs to "think about how to think". It optimizes the inference process by dynamically adapting reasoning strategies in real-time. Our approach employs contextual multi-armed bandits (CMABs) to learn an adaptive policy. It learns to evaluate the current state of LLM's reasoning and determine optimal strategy that is most likely to lead to a successful outcome during inference, like whether to backtrack, switch to a new approach, or restart the problem-solving process. This meta-guidance helps avoid unproductive paths exploration during inference and hence improves computational efficiency. We evaluate Meta-Reasoner on math problems (e.g., Game-of-24, TheoremQA) and scientific tasks (e.g., SciBench). Results show that our method outperform previous SOTA methods by 9-12% in accuracy, while reducing inference time by 28-35% under the same compute budget. Additional experiments on creative writing demonstrate the generalizability of our approach to diverse reasoning-intensive tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | AIME 2024 | Pass@1 Accuracy55.5 | 236 | |
| Math | GSM8K | Accuracy0.87 | 216 | |
| Mathematical Reasoning | Game of 24 | Accuracy94 | 147 | |
| Math | MATH 500 | Accuracy65.4 | 120 | |
| Math Reasoning | AMC | Accuracy36.1 | 95 | |
| Science Question Answering | GPQA | pass@1 Accuracy53.54 | 85 | |
| Mathematical Reasoning | MATH 500 | Accuracy91.73 | 79 | |
| Mathematical Reasoning | AIME 25 | pass@136.67 | 65 | |
| Logical reasoning | Game of 24 | Accuracy81 | 31 | |
| Mathematical Reasoning | GSM8K | Accuracy95.83 | 31 |