DiscoUQ: Structured Disagreement Analysis for Uncertainty Quantification in LLM Agent Ensembles

About

Multi-agent LLM systems, where multiple prompted instances of a language model independently answer questions, are increasingly used for complex reasoning tasks. However, existing methods for quantifying the uncertainty of their collective outputs rely on shallow voting statistics that discard the rich semantic information in agents' reasoning. We introduce DiscoUQ, a framework that extracts and leverages the structure of inter-agent disagreement -- both linguistic properties (evidence overlap, argument strength, divergence depth) and embedding geometry (cluster distances, dispersion, cohesion) -- to produce well-calibrated confidence estimates. We propose three methods of increasing complexity: DiscoUQ-LLM (logistic regression on LLM-extracted structure features), DiscoUQ-Embed (logistic regression on embedding geometry), and DiscoUQ-Learn (a neural network combining all features). Evaluated on four diverse benchmarks (StrategyQA, MMLU, TruthfulQA, ARC-Challenge) with a 5-agent system using Qwen3.5-27B, DiscoUQ-LLM achieves an average AUROC of 0.802, outperforming the best baseline (LLM Aggregator, 0.791) while being substantially better calibrated (ECE 0.036 vs. 0.098). The learned features generalize across benchmarks with near-zero performance degradation and provide the largest improvements where they are most needed: in the ambiguous "weak disagreement" tier where simple vote counting fails.

Bo Jiang• 2026

Related benchmarks

Task	Dataset	Result
Question Answering	TriviaQA	BS (%)10.47	65
Calibration	TriviaQA	--	39
Calibration	TruthfulQA	--	32
Question Answering	TruthfulQA	AUROC0.871	14
Language Understanding	MMLU-Pro	Brier Score24.02	11
Question Answering	TruthfulQA	Brier Score15.31	11
Reasoning	BBH	Brier Score (BBH)17.57	11
Calibration	GSM8K	ECE2.43	11
Calibration	Mean macro-average across benchmarks	Expected Calibration Error (ECE)7.08	11
Math Reasoning	GSM8K	Brier Score4.46	11

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord