Beyond In-Domain Detection: SpikeScore for Cross-Domain Hallucination Detection
About
Hallucination detection is critical for deploying large language models (LLMs) in real-world applications. Existing hallucination detection methods achieve strong performance when the training and test data come from the same domain, but they suffer from poor cross-domain generalization. In this paper, we study an important yet overlooked problem, termed generalizable hallucination detection (GHD), which aims to train hallucination detectors on data from a single domain while ensuring robust performance across diverse related domains. In studying GHD, we simulate multi-turn dialogues following LLMs' initial response and observe an interesting phenomenon: hallucination-initiated multi-turn dialogues universally exhibit larger uncertainty fluctuations than factual ones across different domains. Based on the phenomenon, we propose a new score SpikeScore, which quantifies abrupt fluctuations in multi-turn dialogues. Through both theoretical analysis and empirical validation, we demonstrate that SpikeScore achieves strong cross-domain separability between hallucinated and non-hallucinated responses. Experiments across multiple LLMs and benchmarks demonstrate that the SpikeScore-based detection method outperforms representative baselines in cross-domain generalization and surpasses advanced generalization-oriented methods, verifying the effectiveness of our method in cross-domain hallucination detection.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hallucination Detection | TriviaQA | AUROC0.8697 | 265 | |
| Hallucination Detection | TriviaQA (test) | AUC-ROC86.97 | 169 | |
| Hallucination Detection | RAGTruth (test) | AUROC0.8535 | 83 | |
| Hallucination Detection | MATH | Mean AUROC81.57 | 72 | |
| Hallucination Detection | CommonsenseQA | Mean AUROC0.7563 | 48 | |
| Hallucination Detection | Belebele | Mean AUROC0.7719 | 48 | |
| Hallucination Detection | CoQA | Mean AUROC0.8584 | 48 | |
| Hallucination Detection | SVAMP | Mean AUROC78.37 | 48 | |
| Hallucination Detection | Average Cross-domain | Mean AUROC0.7874 | 48 | |
| Hallucination Detection | RAGTruth | AUROC0.8535 | 36 |