HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs
About
The reliability of Large Language Models (LLMs) in high-stakes domains such as healthcare, law, and scientific discovery is often compromised by hallucinations. These failures typically stem from two sources: data-driven hallucinations and reasoning-driven hallucinations. However, existing detection methods usually address only one source and rely on task-specific heuristics, limiting their generalization to complex scenarios. To overcome these limitations, we introduce the Hallucination Risk Bound, a unified theoretical framework that formally decomposes hallucination risk into data-driven and reasoning-driven components, linked respectively to training-time mismatches and inference-time instabilities. This provides a principled foundation for analyzing how hallucinations emerge and evolve. Building on this foundation, we introduce HalluGuard, an NTK-based score that leverages the induced geometry and captured representations of the NTK to jointly identify data-driven and reasoning-driven hallucinations. We evaluate HalluGuard on 10 diverse benchmarks, 11 competitive baselines, and 9 popular LLM backbones, consistently achieving state-of-the-art performance in detecting diverse forms of LLM hallucinations.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hallucination Detection | HaluEval (test) | AUC-ROC80.79 | 126 | |
| Reasoning | MATH 500 | Accuracy (%)81 | 59 | |
| Hallucination Detection | SQuAD (test) | AUROCr83.8 | 48 | |
| Hallucination Detection | GSM8K (test) | AUROC (Reference)79.01 | 48 | |
| Semantic Hallucination Detection | PAWS | AUROC91.24 | 36 | |
| Hallucination Detection | GSM8K | AUROC80.62 | 20 | |
| Reasoning | Natural | Accuracy70.96 | 12 |