SAUP: Situation Awareness Uncertainty Propagation on LLM Agent
About
Large language models (LLMs) integrated into multistep agent systems enable complex decision-making processes across various applications. However, their outputs often lack reliability, making uncertainty estimation crucial. Existing uncertainty estimation methods primarily focus on final-step outputs, which fail to account for cumulative uncertainty over the multistep decision-making process and the dynamic interactions between agents and their environments. To address these limitations, we propose SAUP (Situation Awareness Uncertainty Propagation), a novel framework that propagates uncertainty through each step of an LLM-based agent's reasoning process. SAUP incorporates situational awareness by assigning situational weights to each step's uncertainty during the propagation. Our method, compatible with various one-step uncertainty estimation techniques, provides a comprehensive and accurate uncertainty measure. Extensive experiments on benchmark datasets demonstrate that SAUP significantly outperforms existing state-of-the-art methods, achieving up to 20% improvement in AUROC.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Uncertainty Estimation | MMLU AutoGen (test) | AUROC0.7193 | 16 | |
| Multi-hop Question Answering | MoreHopQA | AUROC0.6242 | 16 | |
| Uncertainty Estimation | MoreHopQA Camel | AUROC56.68 | 16 | |
| Uncertainty Estimation | MATH AutoGen (test) | AUROC0.6334 | 16 | |
| Uncertainty Estimation | MATH Camel | AUROC0.6078 | 16 | |
| Uncertainty Estimation | MMLU Camel | AUROC0.5641 | 16 | |
| Mathematical Reasoning | MATH | AUROC0.6477 | 16 | |
| Uncertainty Estimation | MoreHopQA AutoGen (test) | AUROC54.88 | 16 | |
| Knowledge Synthesis | MMLU | AUROC53.82 | 16 | |
| Uncertainty Quantification | MMLU OOD via Math Prompts | AUROC62.01 | 4 |