Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

About

The reliability of Large Language Models (LLMs) in high-stakes domains such as healthcare, law, and scientific discovery is often compromised by hallucinations. These failures typically stem from two sources: data-driven hallucinations and reasoning-driven hallucinations. However, existing detection methods usually address only one source and rely on task-specific heuristics, limiting their generalization to complex scenarios. To overcome these limitations, we introduce the Hallucination Risk Bound, a unified theoretical framework that formally decomposes hallucination risk into data-driven and reasoning-driven components, linked respectively to training-time mismatches and inference-time instabilities. This provides a principled foundation for analyzing how hallucinations emerge and evolve. Building on this foundation, we introduce HalluGuard, an NTK-based score that leverages the induced geometry and captured representations of the NTK to jointly identify data-driven and reasoning-driven hallucinations. We evaluate HalluGuard on 10 diverse benchmarks, 11 competitive baselines, and 9 popular LLM backbones, consistently achieving state-of-the-art performance in detecting diverse forms of LLM hallucinations. We open-source our proposed \model{} model at https://github.com/Susan571/HalluGuard-ICLR2026.

Xinyue Zeng, Junhong Lin, Yujun Yan, Feng Guo, Liang Shi, Jun Wu, Dawei Zhou• 2026

Related benchmarks

TaskDatasetResultRank
Hallucination DetectionHaluEval (test)
AUC-ROC80.79
126
ReasoningMATH 500
Accuracy (%)81
90
Hallucination DetectionSQuAD (test)
AUROCr83.8
48
Hallucination DetectionGSM8K (test)
AUROC (Reference)79.01
48
Semantic Hallucination DetectionPAWS
AUROC91.24
36
Hallucination DetectionGSM8K
AUROC80.62
20
ReasoningNatural
Accuracy70.96
12
Showing 7 of 7 rows

Other info

Follow for update