Attention Head Entropy of LLMs Predicts Answer Correctness

About

Large language models (LLMs) often generate plausible yet incorrect answers, posing risks in safety-critical settings such as medicine. Human evaluation is expensive, and LLM-as-judge approaches risk introducing hidden errors. Recent white-box methods detect contextual hallucinations using model internals, focusing on the localization of the attention mass, but two questions remain open: do these approaches extend to predicting answer correctness, and do they generalize out-of-domains? We introduce Head Entropy, a method that predicts answer correctness from attention entropy patterns, specifically measuring the spread of the attention mass. Using sparse logistic regression on per-head 2-Renyi entropies, Head Entropy matches or exceeds baselines in-distribution and generalizes substantially better on out-of-domains, it outperforms the closest baseline on average by +8.5% AUROC. We further show that attention patterns over the question/context alone, before answer generation, already carry predictive signal using Head Entropy with on average +17.7% AUROC over the closest baseline. We evaluate across 5 instruction-tuned LLMs and 3 QA datasets spanning general knowledge, multi-hop reasoning, and medicine.

Sophie Ostmeier, Brian Axelrod, Maya Varma, Asad Aali, Yabin Zhang, Magdalini Paschali, Sanmi Koyejo, Curtis Langlotz, Akshay Chaudhari• 2026

Related benchmarks

Task	Dataset	Result
Agent Task	GTA	Success Rate19.89	16
Agent Task	ToolBench	Success Rate38.68	16
Answer correctness prediction	TriviaQA, HotpotQA, and MedMCQA In-Distribution Average (val)	Head Entropy0.91	12
Step-level correctness prediction	ToolBench	AUROC0.7561	10
Step-level correctness prediction	GTA	AUROC88.21	10
Answer correctness prediction	TriviaQA, HotpotQA, and MedMCQA OOD Generalization Average (val)	Head Entropy0.68	6
Answer correctness prediction	HotpotQA (val)	Head Entropy0.6	1
Answer correctness prediction	MedMCQA (val)	Head Entropy0.79	1
Answer correctness prediction	TriviaQA (val)	Head Entropy0.79	1

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord