Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models

About

Large Language Models (LLMs) often struggle with computational efficiency and error propagation in multi-step reasoning tasks. While recent advancements on prompting and post-training have enabled LLMs to perform step-wise reasoning, they still tend to explore unproductive solution paths without effective backtracking or strategy adjustment. In this paper, we propose Meta-Reasoner, a new framework that empowers LLMs to "think about how to think". It optimizes the inference process by dynamically adapting reasoning strategies in real-time. Our approach employs contextual multi-armed bandits (CMABs) to learn an adaptive policy. It learns to evaluate the current state of LLM's reasoning and determine optimal strategy that is most likely to lead to a successful outcome during inference, like whether to backtrack, switch to a new approach, or restart the problem-solving process. This meta-guidance helps avoid unproductive paths exploration during inference and hence improves computational efficiency. We evaluate Meta-Reasoner on math problems (e.g., Game-of-24, TheoremQA) and scientific tasks (e.g., SciBench). Results show that our method outperform previous SOTA methods by 9-12% in accuracy, while reducing inference time by 28-35% under the same compute budget. Additional experiments on creative writing demonstrate the generalizability of our approach to diverse reasoning-intensive tasks.

Yuan Sui, Yufei He, Tri Cao, Simeng Han, Yulin Chen, Bryan Hooi• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	AIME 2024	Pass@1 Accuracy55.5	236
Math	GSM8K	Accuracy0.87	216
Mathematical Reasoning	Game of 24	Accuracy94	147
Math	MATH 500	Accuracy65.4	120
Math Reasoning	AMC	Accuracy36.1	95
Science Question Answering	GPQA	pass@1 Accuracy53.54	85
Mathematical Reasoning	MATH 500	Accuracy91.73	79
Mathematical Reasoning	AIME 25	pass@136.67	65
Logical reasoning	Game of 24	Accuracy81	31
Mathematical Reasoning	GSM8K	Accuracy95.83	31

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord