Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection

About

The surge in applications of large language models (LLMs) has prompted concerns about the generation of misleading or fabricated information, known as hallucinations. Therefore, detecting hallucinations has become critical to maintaining trust in LLM-generated content. A primary challenge in learning a truthfulness classifier is the lack of a large amount of labeled truthful and hallucinated data. To address the challenge, we introduce HaloScope, a novel learning framework that leverages the unlabeled LLM generations in the wild for hallucination detection. Such unlabeled data arises freely upon deploying LLMs in the open world, and consists of both truthful and hallucinated information. To harness the unlabeled data, we present an automated membership estimation score for distinguishing between truthful and untruthful generations within unlabeled mixture data, thereby enabling the training of a binary truthfulness classifier on top. Importantly, our framework does not require extra data collection and human annotations, offering strong flexibility and practicality for real-world applications. Extensive experiments show that HaloScope can achieve superior hallucination detection performance, outperforming the competitive rivals by a significant margin. Code is available at https://github.com/deeplearningwisc/haloscope.

Xuefeng Du, Chaowei Xiao, Yixuan Li• 2024

Related benchmarks

TaskDatasetResultRank
Hallucination DetectionTriviaQA
AUROC0.8654
438
Hallucination DetectionTriviaQA (test)
AUC-ROC85.3
183
Hallucination DetectionTruthfulQA (test)
AUC-ROC70.6
105
Hallucination DetectionTruthfulQA
AUC (ROC)0.7864
102
Hallucination DetectionNQ-Open
AUROC0.8584
61
Hallucination DetectionMMLU-Pro--
30
Hallucination DetectionLLaMa 1 (test)
AUROC0.861
15
Hallucination DetectionWebQuestions
AUROC80.43
15
Hallucination DetectionTyDiQA (test)
AUROC69
14
Hallucination DetectionNQ open (test)
AUROC62.7
14
Showing 10 of 12 rows

Other info

Follow for update