Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Attention Head Entropy of LLMs Predicts Answer Correctness

About

Large language models (LLMs) often generate plausible yet incorrect answers, posing risks in safety-critical settings such as medicine. Human evaluation is expensive, and LLM-as-judge approaches risk introducing hidden errors. Recent white-box methods detect contextual hallucinations using model internals, focusing on the localization of the attention mass, but two questions remain open: do these approaches extend to predicting answer correctness, and do they generalize out-of-domains? We introduce Head Entropy, a method that predicts answer correctness from attention entropy patterns, specifically measuring the spread of the attention mass. Using sparse logistic regression on per-head 2-Renyi entropies, Head Entropy matches or exceeds baselines in-distribution and generalizes substantially better on out-of-domains, it outperforms the closest baseline on average by +8.5% AUROC. We further show that attention patterns over the question/context alone, before answer generation, already carry predictive signal using Head Entropy with on average +17.7% AUROC over the closest baseline. We evaluate across 5 instruction-tuned LLMs and 3 QA datasets spanning general knowledge, multi-hop reasoning, and medicine.

Sophie Ostmeier, Brian Axelrod, Maya Varma, Asad Aali, Yabin Zhang, Magdalini Paschali, Sanmi Koyejo, Curtis Langlotz, Akshay Chaudhari• 2026

Related benchmarks

TaskDatasetResultRank
Agent TaskGTA
Success Rate19.89
16
Agent TaskToolBench
Success Rate38.68
16
Answer correctness predictionTriviaQA, HotpotQA, and MedMCQA In-Distribution Average (val)
Head Entropy0.91
12
Step-level correctness predictionToolBench
AUROC0.7561
10
Step-level correctness predictionGTA
AUROC88.21
10
Answer correctness predictionTriviaQA, HotpotQA, and MedMCQA OOD Generalization Average (val)
Head Entropy0.68
6
Answer correctness predictionHotpotQA (val)
Head Entropy0.6
1
Answer correctness predictionMedMCQA (val)
Head Entropy0.79
1
Answer correctness predictionTriviaQA (val)
Head Entropy0.79
1
Showing 9 of 9 rows

Other info

Follow for update