Hallucination Detection in LLMs Using Spectral Features of Attention Maps

About

Large Language Models (LLMs) have demonstrated remarkable performance across various tasks but remain prone to hallucinations. Detecting hallucinations is essential for safety-critical applications, and recent methods leverage attention map properties to this end, though their effectiveness remains limited. In this work, we investigate the spectral features of attention maps by interpreting them as adjacency matrices of graph structures. We propose the $\text{LapEigvals}$ method, which utilises the top-$k$ eigenvalues of the Laplacian matrix derived from the attention maps as an input to hallucination detection probes. Empirical evaluations demonstrate that our approach achieves state-of-the-art hallucination detection performance among attention-based methods. Extensive ablation studies further highlight the robustness and generalisation of $\text{LapEigvals}$, paving the way for future advancements in the hallucination detection domain.

Jakub Binkowski, Denis Janiak, Albert Sawczyn, Bogdan Gabrys, Tomasz Kajdanowicz• 2025

Related benchmarks

Task	Dataset	Result
Hallucination Detection	TriviaQA	AUROC0.874	621
Hallucination Detection	TruthfulQA	AUC (ROC)0.806	178
Hallucination Detection	GSM8K	AUROC82.6	115
Hallucination Detection	TruthfulQA (test)	AUC-ROC58.9	112
Hallucination Detection	NQ-Open	AUROC0.82	63
Hallucination Detection	HaluEvalQA	ROC-AUC87.8	39
Hallucination Detection	SQuAD v2	ROC-AUC0.785	28
Hallucination Detection	UMWP	ROC-AUC86.4	28
Hallucination Detection	LLaMa 1 (test)	AUROC0.871	15

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord