Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Uncertainty Quantification for LLMs through Minimum Bayes Risk: Bridging Confidence and Consistency

About

Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompass a variety of approaches, with two major types being particularly prominent: information-based, which focus on model confidence expressed as token probabilities, and consistency-based, which assess the semantic relationship between multiple outputs generated using repeated sampling. Several recent methods have combined these two approaches to boost UQ performance. However, they sometimes fail to outperform much simpler baseline methods. Our work discusses the fundamental approach to constructing uncertainty measures that directly links uncertainty with the minimum Bayes risks achieved by LLM decoding. Building on these findings, we propose a novel approach to integrating model confidence with output consistency, resulting in a family of efficient and robust UQ methods. Our investigation reveals distinctive characteristics of LLMs as probabilistic models, which help to explain why these UQ methods underperform in certain tasks. Based on these findings, we propose a new way of synthesizing model confidence and output consistency, leading to a family of efficient and robust UQ methods. We evaluate our approach across various tasks such as question answering, abstractive summarization, and machine translation, demonstrating sizable improvements over state-of-the-art UQ approaches.

Roman Vashurin, Maiya Goloburda, Albina Ilina, Aleksandr Rubashevskii, Preslav Nakov, Artem Shelmanov, Maxim Panov• 2025

Related benchmarks

TaskDatasetResultRank
Question AnsweringTriviaQA
EM59.4
116
Multi-answer Question AnsweringMAQA-ΔK−1
KL Divergence0.411
48
Uncertainty EstimationTriviaQA--
37
Question AnsweringTriviaQA
Exact Match AUC81.3
28
Uncertainty QuantificationCNN/DailyMail
Hamming AUC0.628
28
Uncertainty QuantificationMAQA-ΔK−1
KL Divergence AUC0.721
28
Multi-answer Question AnsweringMAQA
Hamming Distance0.335
28
Machine TranslationWMT19
COMET Score0.292
28
Uncertainty QuantificationWMT 19
COMET AUC0.597
28
SummarizationCNN/DailyMail
Hamming Score-0.158
28
Showing 10 of 11 rows

Other info

Follow for update