Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models

About

Large Language Models (LLMs) often struggle with computational efficiency and error propagation in multi-step reasoning tasks. While recent advancements on prompting and post-training have enabled LLMs to perform step-wise reasoning, they still tend to explore unproductive solution paths without effective backtracking or strategy adjustment. In this paper, we propose Meta-Reasoner, a new framework that empowers LLMs to "think about how to think". It optimizes the inference process by dynamically adapting reasoning strategies in real-time. Our approach employs contextual multi-armed bandits (CMABs) to learn an adaptive policy. It learns to evaluate the current state of LLM's reasoning and determine optimal strategy that is most likely to lead to a successful outcome during inference, like whether to backtrack, switch to a new approach, or restart the problem-solving process. This meta-guidance helps avoid unproductive paths exploration during inference and hence improves computational efficiency. We evaluate Meta-Reasoner on math problems (e.g., Game-of-24, TheoremQA) and scientific tasks (e.g., SciBench). Results show that our method outperform previous SOTA methods by 9-12\% in accuracy, while reducing inference time by 28-35\% under the same compute budget. Additional experiments on creative writing demonstrate the generalizability of our approach to diverse reasoning-intensive tasks.

Yuan Sui, Yufei He, Tri Cao, Simeng Han, Yulin Chen, Bryan Hooi• 2025

Related benchmarks

TaskDatasetResultRank
Science Question AnsweringGPQA
pass@1 Accuracy53.54
85
Mathematical ReasoningAIME 25
pass@136.67
65
Fermi Problem SolvingFermi
Pass@1 Accuracy38.67
24
Multimodal Mathematical ReasoningMathV
Pass@1 Accuracy21.05
12
Multimodal Maze SolvingMaze
Pass@1 Accuracy30.5
8
Showing 5 of 5 rows

Other info

Follow for update