Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Adaptive Uncertainty-Aware Tree Search for Robust Reasoning

About

Inference-time reasoning scaling has significantly advanced the capabilities of Large Language Models (LLMs) in complex problem-solving. A prevalent approach involves external search guided by Process Reward Models (PRMs). However, a fundamental limitation of this framework is the epistemic uncertainty of PRMs when evaluating reasoning paths that deviate from their training distribution. In this work, we conduct a systematic analysis of this challenge. We first provide empirical evidence that PRMs exhibit high uncertainty and unreliable scoring on out-of-distribution (OOD) samples. We then establish a theoretical framework proving that while standard search incurs linear regret accumulation, an uncertainty-aware strategy can achieve sublinear regret. Motivated by these findings, we propose Uncertainty-Aware Tree Search (UATS), a unified method that estimates uncertainty via Monte Carlo Dropout and dynamically allocates compute budget using a reinforcement learning-based controller. Extensive experiments demonstrate that our approach effectively mitigates the impact of OOD errors.

Zeen Song, Zihao Ma, Wenwen Qiang, Changwen Zheng, Gang Hua• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME
AIME Accuracy28.1
283
Mathematical ReasoningMATH500 (full)
Accuracy88.8
111
Showing 2 of 2 rows

Other info

Follow for update