Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Every Response Counts: Quantifying Uncertainty of LLM-based Multi-Agent Systems through Tensor Decomposition

About

While Large Language Model-based Multi-Agent Systems (MAS) consistently outperform single-agent systems on complex tasks, their intricate interactions introduce critical reliability challenges arising from communication dynamics and role dependencies. Existing Uncertainty Quantification methods, typically designed for single-turn outputs, fail to address the unique complexities of the MAS. Specifically, these methods struggle with three distinct challenges: the cascading uncertainty in multi-step reasoning, the variability of inter-agent communication paths, and the diversity of communication topologies. To bridge this gap, we introduce MATU, a novel framework that quantifies uncertainty through tensor decomposition. MATU moves beyond analyzing final text outputs by representing entire reasoning trajectories as embedding matrices and organizing multiple execution runs into a higher-order tensor. By applying tensor decomposition, we disentangle and quantify distinct sources of uncertainty, offering a comprehensive reliability measure that is generalizable across different agent structures. We provide comprehensive experiments to show that MATU effectively estimates holistic and robust uncertainty across diverse tasks and communication topologies.

Tiejin Chen, Huaiyuan Yao, Jia Chen, Evangelos E. Papalexakis, Hua Wei• 2026

Related benchmarks

TaskDatasetResultRank
Knowledge SynthesisMMLU
AUROC59.25
16
Mathematical ReasoningMATH
AUROC0.7121
16
Multi-hop Question AnsweringMoreHopQA
AUROC0.6457
16
Uncertainty EstimationMATH AutoGen (test)
AUROC0.7544
16
Uncertainty EstimationMoreHopQA AutoGen (test)
AUROC63.92
16
Uncertainty EstimationMMLU AutoGen (test)
AUROC0.7315
16
Uncertainty EstimationMATH Camel
AUROC0.7354
16
Uncertainty EstimationMoreHopQA Camel
AUROC65.29
16
Uncertainty EstimationMMLU Camel
AUROC0.7149
16
Uncertainty QuantificationMMLU OOD via Math Prompts
AUROC67.7
4
Showing 10 of 10 rows

Other info

Follow for update