Calibrating Verbalized Confidence with Self-Generated Distractors

About

Calibrated confidence estimates are necessary for large language model (LLM) outputs to be trusted by human users. While LLMs can express their confidence in human-interpretable ways, verbalized LLM-generated confidence scores have empirically been found to be miscalibrated, reporting high confidence on instances with low accuracy and thereby harming trust and safety. We hypothesize that this overconfidence often stems from a given LLM's heightened suggestibility when faced with claims that it encodes little information about; we empirically validate this hypothesis, finding more suggestibility on lower-accuracy claims. Building on this finding, we introduce Distractor-Normalized Coherence (DINCO), which estimates and accounts for an LLM's suggestibility bias by having the model verbalize its confidence independently across several self-generated distractors (i.e. alternative claims), and normalizes by the total verbalized confidence. To further improve calibration, we leverage generator-validator disagreement, augmenting normalized validator confidence with a consistency-based estimate of generator confidence. Here, we frame the popular approach of self-consistency as leveraging coherence across sampled generations, and normalized verbalized confidence as leveraging coherence across validations on incompatible claims, allowing us to integrate these complementary dimensions of coherence into DINCO. Moreover, our analysis shows that DINCO provides less saturated -- and therefore more usable -- confidence estimates, and that further sampling alone cannot close the gap between DINCO and baselines, with DINCO at 10 inference calls outperforming self-consistency at 100.

Victor Wang, Elias Stengel-Eskin• 2025

Related benchmarks

Task	Dataset	Result
Question Answering	TriviaQA	ECE0.044	28
Calibration	stream 2,000-question	ECE10.1	21
Correctness detection	MMLU-Pro	AUROC71.33	20
Correctness detection	Non-MAQA	AUROC57.49	20
Correctness detection	Non-AmbigQA	AUROC52.75	20
Short-form Question Answering	SimpleQA	ECE7.9	18
Long-form Factuality Calibration	FactScore	ECE0.076	8
Confidence calibration	TriviaQA (test)	Expected Calibration Error0.065	7
Question Answering	BioASQ Task B 14 2026 challenge edition (sampled 1000 factoid questions)	ECE0.071	7

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord