Agentic Uncertainty Quantification

About

Although AI agents have demonstrated impressive capabilities in long-horizon reasoning, their reliability is severely hampered by the ``Spiral of Hallucination,'' where early epistemic errors propagate irreversibly. Existing methods face a dilemma: uncertainty quantification (UQ) methods typically act as passive sensors, only diagnosing risks without addressing them, while self-reflection mechanisms suffer from continuous or aimless corrections. To bridge this gap, we propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals. Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary. This enables the agent to balance efficient execution and deep deliberation dynamically. Extensive experiments on closed-loop benchmarks and open-ended deep research tasks demonstrate that our training-free approach achieves superior performance and trajectory-level calibration. We believe this principled framework AUQ represents a significant step towards reliable agents.

Jiaxin Zhang, Prafulla Kumar Choubey, Kung-Hsiang Huang, Caiming Xiong, Chien-Sheng Wu• 2026

Related benchmarks

Task	Dataset	Result
Embodied decision-making	AlfWorld	Success Rate74.3	51
Function Calling	BFCL v4	Multi-Turn Success Rate23.5	32
Clarification Seeking	WebShop Clarification (test)	Success Rate0.245	15
Decision Making	Webshop	Success Rate42.9	15
Clarification Seeking	ALFWorld Clarification (test)	Success Rate64.7	15
Tool-Calling Decision Making	When2Call (test)	Normalized Accuracy56.35	15
Tool Use	ToolSandbox (test)	Overall Score64.08	12
Open-Ended Deep Research	DeepResearch Bench Open-Ended	Overall Score52.09	11
E-commerce environment navigation	Webshop	ECE (End-State)0.185	7

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord