Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores

About

Large language models (LLMs) are often confidently wrong, making reliable uncertainty estimation (UE) essential. Output-based heuristics are cheap but brittle, while probing internal representations is effective yet high-dimensional and hard to transfer. We propose a compact, per-instance UE method that scores cross-layer agreement patterns in internal representations using a single forward pass. Across three models, our method matches probing in-distribution, with mean diagonal differences of at most $-1.8$ AUPRC percentage points and $+4.9$ Brier score points. Under cross-dataset transfer, it consistently outperforms probing, achieving off-diagonal gains up to $+2.86$ AUPRC and $+21.02$ Brier points. Under 4-bit weight-only quantization, it remains robust, improving over probing by $+1.94$ AUPRC points and $+5.33$ Brier points on average. Beyond performance, examining specific layer--layer interactions reveals differences in how disparate models encode uncertainty. Altogether, our UE method offers a lightweight, compact means to capture transferable uncertainty in LLMs.

Zvi N. Badash, Yonatan Belinkov, Moti Freiman• 2026

Related benchmarks

Task	Dataset	Result
Uncertainty Estimation	TriviaQA (test)	AUROC71.19	110
Uncertainty Estimation	HotpotQA (test)	AUPRC72.79	12
Uncertainty Estimation	Movies (test)	AUPRC (pp)66.56	6
Uncertainty Estimation	Within-dataset Diagonal	AUPRC Difference1.37	3
Uncertainty Estimation	Across-dataset Off-diagonal	AUPRC Difference (pp)2.86	3

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord