Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Prompt-Guided Internal States for Hallucination Detection of Large Language Models

About

Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of tasks in different domains. However, they sometimes generate responses that are logically coherent but factually incorrect or misleading, which is known as LLM hallucinations. Data-driven supervised methods train hallucination detectors by leveraging the internal states of LLMs, but detectors trained on specific domains often struggle to generalize well to other domains. In this paper, we aim to enhance the cross-domain performance of supervised detectors with only in-domain data. We propose a novel framework, prompt-guided internal states for hallucination detection of LLMs, namely PRISM. By utilizing appropriate prompts to guide changes to the structure related to text truthfulness in LLMs' internal states, we make this structure more salient and consistent across texts from different domains. We integrated our framework with existing hallucination detection methods and conducted experiments on datasets from different domains. The experimental results indicate that our framework significantly enhances the cross-domain generalization of existing hallucination detection methods.

Fujie Zhang, Peiqi Yu, Biao Yi, Baolei Zhang, Tong Li, Zheli Liu• 2024

Related benchmarks

TaskDatasetResultRank
Hallucination DetectionTriviaQA
AUROC0.7017
265
Hallucination DetectionMATH
Mean AUROC65.16
72
Hallucination DetectionCommonsenseQA
Mean AUROC0.7187
48
Hallucination DetectionCoQA
Mean AUROC0.8116
48
Hallucination DetectionBelebele
Mean AUROC0.7097
48
Hallucination DetectionAverage Cross-domain
Mean AUROC0.7072
48
Hallucination DetectionSVAMP
Mean AUROC68.11
48
Hallucination DetectionMixture-domain TriviaQA, CSQA, Belebele, CoQA, Math, SVAMP
AUROC0.672
24
Hallucination DetectionLogicStruct
Neg. Score73.89
14
Hallucination DetectionTrue-False
Animals Score74.05
7
Showing 10 of 12 rows

Other info

Code

Follow for update