Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A Geometric Taxonomy of Hallucinations in LLMs

About

The term "hallucination" in large language models conflates distinct phenomena with different geometric signatures in embedding space. We propose a taxonomy identifying three types: unfaithfulness (failure to engage with provided context), confabulation (invention of semantically foreign content), and factual error (incorrect claims within correct conceptual frames). We observe a striking asymmetry. On standard benchmarks where hallucinations are LLM-generated, detection is domain-local: AUROC 0.76-0.99 within domains, but 0.50 (chance level) across domains. Discriminative directions are approximately orthogonal between domains (mean cosine similarity -0.07). On human-crafted confabulations - invented institutions, redefined terminology, fabricated mechanisms - a single global direction achieves 0.96 AUROC with 3.8% cross-domain degradation. We interpret this divergence as follows: benchmarks capture generation artifacts (stylistic signatures of prompted fabrication), while human-crafted confabulations capture genuine topical drift. The geometric structure differs because the underlying phenomena differ. Type III errors show 0.478 AUROC - indistinguishable from chance. This reflects a theoretical constraint: embeddings encode distributional co-occurrence, not correspondence to external reality. Statements with identical contextual patterns occupy similar embedding regions regardless of truth value. The contribution is a geometric taxonomy clarifying the scope of embedding-based detection: Types I and II are detectable; Type III requires external verification mechanisms.

Javier Mar\'in• 2026

Related benchmarks

TaskDatasetResultRank
Hallucination DetectionTruthfulQA (test)
AUC-ROC76
91
Confabulation DetectionHuman-crafted confabulations
Finance AUROC100
2
Hallucination DetectionHaluEval Dialogue (test)
Groundedness (Gamma)0.287
1
Showing 3 of 3 rows

Other info

Follow for update