Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Calibrating LLMs with Information-Theoretic Evidential Deep Learning

About

Fine-tuned large language models (LLMs) often exhibit overconfidence, particularly when trained on small datasets, resulting in poor calibration and inaccurate uncertainty estimates. Evidential Deep Learning (EDL), an uncertainty-aware approach, enables uncertainty estimation in a single forward pass, making it a promising method for calibrating fine-tuned LLMs. However, despite its computational efficiency, EDL is prone to overfitting, as its training objective can result in overly concentrated probability distributions. To mitigate this, we propose regularizing EDL by incorporating an information bottleneck (IB). Our approach IB-EDL suppresses spurious information in the evidence generated by the model and encourages truly predictive information to influence both the predictions and uncertainty estimates. Extensive experiments across various fine-tuned LLMs and tasks demonstrate that IB-EDL outperforms both existing EDL and non-EDL approaches. By improving the trustworthiness of LLMs, IB-EDL facilitates their broader adoption in domains requiring high levels of confidence calibration. Code is available at https://github.com/sandylaker/ib-edl.

Yawei Li, David R\"ugamer, Bernd Bischl, Mina Rezaei• 2025

Related benchmarks

TaskDatasetResultRank
Out-of-Distribution DetectionCIFAR-10
AUROC87.58
121
Out-of-Distribution DetectionImageNet--
108
Out-of-Distribution DetectionCIFAR-10 vs CIFAR-100
AUROC79.76
70
Multiple-choice Question AnsweringOBQA
Accuracy85.13
69
Multiple-choice Question AnsweringRACE
Accuracy87.98
54
Out-of-Distribution DetectionOBQA to MMLU
AUROC83.16
41
Out-of-Distribution DetectionRACE to MMLU
AUROC80.48
41
Out-of-Distribution DetectionCIFAR10 vs. SVHN
AUROC78.22
31
Out-of-Distribution DetectionImageNet-R
ROC AUC0.5503
28
Uncertainty EstimationOBQA
AUROC80.67
24
Showing 10 of 23 rows

Other info

Follow for update