SAUP: Situation Awareness Uncertainty Propagation on LLM Agent

About

Large language models (LLMs) integrated into multistep agent systems enable complex decision-making processes across various applications. However, their outputs often lack reliability, making uncertainty estimation crucial. Existing uncertainty estimation methods primarily focus on final-step outputs, which fail to account for cumulative uncertainty over the multistep decision-making process and the dynamic interactions between agents and their environments. To address these limitations, we propose SAUP (Situation Awareness Uncertainty Propagation), a novel framework that propagates uncertainty through each step of an LLM-based agent's reasoning process. SAUP incorporates situational awareness by assigning situational weights to each step's uncertainty during the propagation. Our method, compatible with various one-step uncertainty estimation techniques, provides a comprehensive and accurate uncertainty measure. Extensive experiments on benchmark datasets demonstrate that SAUP significantly outperforms existing state-of-the-art methods, achieving up to 20% improvement in AUROC.

Qiwei Zhao, Xujiang Zhao, Yanchi Liu, Wei Cheng, Yiyou Sun, Mika Oishi, Takao Osaki, Katsushi Matsuda, Huaxiu Yao, Haifeng Chen• 2024

Related benchmarks

Task	Dataset	Result
Uncertainty Estimation	MMLU AutoGen (test)	AUROC0.7193	16
Multi-hop Question Answering	MoreHopQA	AUROC0.6242	16
Uncertainty Estimation	MoreHopQA Camel	AUROC56.68	16
Uncertainty Estimation	MATH AutoGen (test)	AUROC0.6334	16
Uncertainty Estimation	MATH Camel	AUROC0.6078	16
Uncertainty Estimation	MMLU Camel	AUROC0.5641	16
Mathematical Reasoning	MATH	AUROC0.6477	16
Uncertainty Estimation	MoreHopQA AutoGen (test)	AUROC54.88	16
Knowledge Synthesis	MMLU	AUROC53.82	16
Uncertainty Quantification	MMLU OOD via Math Prompts	AUROC62.01	4

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord