Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models

About

Large Language Models (LLMs) often struggle with computational efficiency and error propagation in multi-step reasoning tasks. While recent advancements on prompting and post-training have enabled LLMs to perform step-wise reasoning, they still tend to explore unproductive solution paths without effective backtracking or strategy adjustment. In this paper, we propose Meta-Reasoner, a new framework that empowers LLMs to "think about how to think". It optimizes the inference process by dynamically adapting reasoning strategies in real-time. Our approach employs contextual multi-armed bandits (CMABs) to learn an adaptive policy. It learns to evaluate the current state of LLM's reasoning and determine optimal strategy that is most likely to lead to a successful outcome during inference, like whether to backtrack, switch to a new approach, or restart the problem-solving process. This meta-guidance helps avoid unproductive paths exploration during inference and hence improves computational efficiency. We evaluate Meta-Reasoner on math problems (e.g., Game-of-24, TheoremQA) and scientific tasks (e.g., SciBench). Results show that our method outperform previous SOTA methods by 9-12\% in accuracy, while reducing inference time by 28-35\% under the same compute budget. Additional experiments on creative writing demonstrate the generalizability of our approach to diverse reasoning-intensive tasks.

Yuan Sui, Yufei He, Tri Cao, Simeng Han, Yulin Chen, Bryan Hooi• 2025

Related benchmarks

TaskDatasetResultRank
MathGSM8K
Accuracy0.87
206
Math ReasoningAMC
Accuracy36.1
95
MathMATH 500
Accuracy65.4
86
Science Question AnsweringGPQA
pass@1 Accuracy53.54
85
Mathematical ReasoningAIME 25
pass@136.67
65
Fermi Problem SolvingFermi
Pass@1 Accuracy38.67
24
math Q&AOlympiad
Accuracy27.4
14
general multi-choiceMMLU-Pro
Science Accuracy45.4
14
Multimodal Mathematical ReasoningMathV
Pass@1 Accuracy21.05
12
Multimodal Maze SolvingMaze
Pass@1 Accuracy30.5
8
Showing 10 of 10 rows

Other info

Follow for update