Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

About

The reliability of Large Language Models (LLMs) in high-stakes domains such as healthcare, law, and scientific discovery is often compromised by hallucinations. These failures typically stem from two sources: data-driven hallucinations and reasoning-driven hallucinations. However, existing detection methods usually address only one source and rely on task-specific heuristics, limiting their generalization to complex scenarios. To overcome these limitations, we introduce the Hallucination Risk Bound, a unified theoretical framework that formally decomposes hallucination risk into data-driven and reasoning-driven components, linked respectively to training-time mismatches and inference-time instabilities. This provides a principled foundation for analyzing how hallucinations emerge and evolve. Building on this foundation, we introduce HalluGuard, an NTK-based score that leverages the induced geometry and captured representations of the NTK to jointly identify data-driven and reasoning-driven hallucinations. We evaluate HalluGuard on 10 diverse benchmarks, 11 competitive baselines, and 9 popular LLM backbones, consistently achieving state-of-the-art performance in detecting diverse forms of LLM hallucinations.

Xinyue Zeng, Junhong Lin, Yujun Yan, Feng Guo, Liang Shi, Jun Wu, Dawei Zhou• 2026

Related benchmarks

TaskDatasetResultRank
Hallucination DetectionHaluEval (test)
AUC-ROC80.79
126
ReasoningMATH 500
Accuracy (%)81
59
Hallucination DetectionSQuAD (test)
AUROCr83.8
48
Hallucination DetectionGSM8K (test)
AUROC (Reference)79.01
48
Semantic Hallucination DetectionPAWS
AUROC91.24
36
Hallucination DetectionGSM8K
AUROC80.62
20
ReasoningNatural
Accuracy70.96
12
Showing 7 of 7 rows

Other info

Follow for update