Attention Head Entropy of LLMs Predicts Answer Correctness
About
Large language models (LLMs) often generate plausible yet incorrect answers, posing risks in safety-critical settings such as medicine. Human evaluation is expensive, and LLM-as-judge approaches risk introducing hidden errors. Recent white-box methods detect contextual hallucinations using model internals, focusing on the localization of the attention mass, but two questions remain open: do these approaches extend to predicting answer correctness, and do they generalize out-of-domains? We introduce Head Entropy, a method that predicts answer correctness from attention entropy patterns, specifically measuring the spread of the attention mass. Using sparse logistic regression on per-head 2-Renyi entropies, Head Entropy matches or exceeds baselines in-distribution and generalizes substantially better on out-of-domains, it outperforms the closest baseline on average by +8.5% AUROC. We further show that attention patterns over the question/context alone, before answer generation, already carry predictive signal using Head Entropy with on average +17.7% AUROC over the closest baseline. We evaluate across 5 instruction-tuned LLMs and 3 QA datasets spanning general knowledge, multi-hop reasoning, and medicine.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Agent Task | GTA | Success Rate19.89 | 16 | |
| Agent Task | ToolBench | Success Rate38.68 | 16 | |
| Answer correctness prediction | TriviaQA, HotpotQA, and MedMCQA In-Distribution Average (val) | Head Entropy0.91 | 12 | |
| Step-level correctness prediction | ToolBench | AUROC0.7561 | 10 | |
| Step-level correctness prediction | GTA | AUROC88.21 | 10 | |
| Answer correctness prediction | TriviaQA, HotpotQA, and MedMCQA OOD Generalization Average (val) | Head Entropy0.68 | 6 | |
| Answer correctness prediction | HotpotQA (val) | Head Entropy0.6 | 1 | |
| Answer correctness prediction | MedMCQA (val) | Head Entropy0.79 | 1 | |
| Answer correctness prediction | TriviaQA (val) | Head Entropy0.79 | 1 |