Calibrating LLM Confidence by Probing Perturbed Representation Stability

About

Miscalibration in Large Language Models (LLMs) undermines their reliability, highlighting the need for accurate confidence estimation. We introduce CCPS (Calibrating LLM Confidence by Probing Perturbed Representation Stability), a novel method analyzing internal representational stability in LLMs. CCPS applies targeted adversarial perturbations to final hidden states, extracts features reflecting the model's response to these perturbations, and uses a lightweight classifier to predict answer correctness. CCPS was evaluated on LLMs from 8B to 32B parameters (covering Llama, Qwen, and Mistral architectures) using MMLU and MMLU-Pro benchmarks in both multiple-choice and open-ended formats. Our results show that CCPS significantly outperforms current approaches. Across four LLMs and three MMLU variants, CCPS reduces Expected Calibration Error by approximately 55% and Brier score by 21%, while increasing accuracy by 5 percentage points, Area Under the Precision-Recall Curve by 4 percentage points, and Area Under the Receiver Operating Characteristic Curve by 6 percentage points, all relative to the strongest prior method. CCPS delivers an efficient, broadly applicable, and more accurate solution for estimating LLM confidence, thereby improving their trustworthiness.

Reza Khanmohammadi, Erfan Miahi, Mehrsa Mardikoraem, Simerjot Kaur, Ivan Brugere, Charese H. Smiley, Kundan Thind, Mohammad M. Ghassemi• 2025

Related benchmarks

Task	Dataset	Result
Reading Comprehension	RACE	Accuracy60.56	75
Confidence Estimation	VLCB Pooled Aggregate (test)	ECE7.7	48
Large Vision-Language Model Evaluation	Unweighted Average	ECE30	29
Question Answering	OpenBookQA published (test)	Accuracy52	25
Vision-Language Question Answering	Pooled Shared (GQA, POPE, LLaVA-Wild, MMMU Pro, GMAI-MMBench, MME-Finance) (test)	Expected Calibration Error (ECE)14.7	22
Mathematical Reasoning	Math-MC (test)	Accuracy55.74	15
Commonsense Reasoning	HellaSwag published (test)	Accuracy80.79	15
Multimodal Understanding	Cross-LVLM (Aggregate of GQA, GMAI-MMBench, POPE, MME-Finance, MMMU_Pro, LLaVA-Wild) (test)	ECE28.7	8
Truthfulness and Calibration Evaluation	Cross-LVLM Pooled Average (GQA, POPE, etc.)	ECE15.3	8
Calibration and Discrimination	Shared pooled aggregation (test)	Brier Score (BS)0.1	4

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord