Uncertainty Quantification for LLMs through Minimum Bayes Risk: Bridging Confidence and Consistency

About

Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompass a variety of approaches, with two major types being particularly prominent: information-based, which focus on model confidence expressed as token probabilities, and consistency-based, which assess the semantic relationship between multiple outputs generated using repeated sampling. Several recent methods have combined these two approaches to boost UQ performance. However, they sometimes fail to outperform much simpler baseline methods. Our work discusses the fundamental approach to constructing uncertainty measures that directly links uncertainty with the minimum Bayes risks achieved by LLM decoding. Building on these findings, we propose a novel approach to integrating model confidence with output consistency, resulting in a family of efficient and robust UQ methods. Our investigation reveals distinctive characteristics of LLMs as probabilistic models, which help to explain why these UQ methods underperform in certain tasks. Based on these findings, we propose a new way of synthesizing model confidence and output consistency, leading to a family of efficient and robust UQ methods. We evaluate our approach across various tasks such as question answering, abstractive summarization, and machine translation, demonstrating sizable improvements over state-of-the-art UQ approaches.

Roman Vashurin, Maiya Goloburda, Albina Ilina, Aleksandr Rubashevskii, Preslav Nakov, Artem Shelmanov, Maxim Panov• 2025

Related benchmarks

Task	Dataset	Result
Question Answering	TriviaQA	EM59.4	182
Uncertainty Estimation	TriviaQA	--	111
Multi-answer Question Answering	MAQA	Hamming Distance0.335	52
Multi-answer Question Answering	MAQA-ΔK−1	KL Divergence0.411	48
Question Answering	TriviaQA	Exact Match AUC81.3	28
Uncertainty Quantification	CNN/DailyMail	Hamming AUC0.628	28
Uncertainty Quantification	MAQA-ΔK−1	KL Divergence AUC0.721	28
Machine Translation	WMT19	COMET Score0.292	28
Uncertainty Quantification	WMT 19	COMET AUC0.597	28
Summarization	CNN/DailyMail	Hamming Score-0.158	28

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord