Adaptive Uncertainty-Aware Tree Search for Robust Reasoning

About

Inference-time reasoning scaling has significantly advanced the capabilities of Large Language Models (LLMs) in complex problem-solving. A prevalent approach involves external search guided by Process Reward Models (PRMs). However, a fundamental limitation of this framework is the epistemic uncertainty of PRMs when evaluating reasoning paths that deviate from their training distribution. In this work, we conduct a systematic analysis of this challenge. We first provide empirical evidence that PRMs exhibit high uncertainty and unreliable scoring on out-of-distribution (OOD) samples. We then establish a theoretical framework proving that while standard search incurs linear regret accumulation, an uncertainty-aware strategy can achieve sublinear regret. Motivated by these findings, we propose Uncertainty-Aware Tree Search (UATS), a unified method that estimates uncertainty via Monte Carlo Dropout and dynamically allocates compute budget using a reinforcement learning-based controller. Extensive experiments demonstrate that our approach effectively mitigates the impact of OOD errors.

Zeen Song, Zihao Ma, Wenwen Qiang, Changwen Zheng, Gang Hua• 2026

Related benchmarks

Task	Dataset	Result	Rank
Mathematical Reasoning	AIME	AIME Accuracy28.1		288
Mathematical Reasoning	MATH500 (full)	Accuracy88.8		111

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord