Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

From Out-of-Distribution Detection to Hallucination Detection: A Geometric View

About

Detecting hallucinations in large language models is a critical open problem with significant implications for safety and reliability. While existing hallucination detection methods achieve strong performance in question-answering tasks, they remain less effective on tasks requiring reasoning. In this work, we revisit hallucination detection through the lens of out-of-distribution (OOD) detection, a well-studied problem in areas like computer vision. Treating next-token prediction in language models as a classification task allows us to apply OOD techniques, provided appropriate modifications are made to account for the structural differences in large language models. We show that OOD-based approaches yield training-free, single-sample-based detectors, achieving strong accuracy in hallucination detection for reasoning tasks. Overall, our work suggests that reframing hallucination detection as OOD detection provides a promising and scalable pathway toward language model safety.

Litian Liu, Reza Pourreza, Yubing Jian, Yao Qin, Roland Memisevic• 2026

Related benchmarks

TaskDatasetResultRank
Hallucination DetectionCSQA
AUROC72.47
55
Hallucination DetectionGSM8K
AUROC80.6
53
Hallucination DetectionAQUA
AUROC0.7822
31
Showing 3 of 3 rows

Other info

Follow for update