Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models

About

Hallucinations in large language models (LLMs) refer to the phenomenon of LLMs producing responses that are coherent yet factually inaccurate. This issue undermines the effectiveness of LLMs in practical applications, necessitating research into detecting and mitigating hallucinations of LLMs. Previous studies have mainly concentrated on post-processing techniques for hallucination detection, which tend to be computationally intensive and limited in effectiveness due to their separation from the LLM's inference process. To overcome these limitations, we introduce MIND, an unsupervised training framework that leverages the internal states of LLMs for real-time hallucination detection without requiring manual annotations. Additionally, we present HELM, a new benchmark for evaluating hallucination detection across multiple LLMs, featuring diverse LLM outputs and the internal states of LLMs during their inference process. Our experiments demonstrate that MIND outperforms existing state-of-the-art methods in hallucination detection.

Weihang Su, Changyue Wang, Qingyao Ai, Yiran HU, Zhijing Wu, Yujia Zhou, Yiqun Liu• 2024

Related benchmarks

Task	Dataset	Result
Hallucination Detection	TriviaQA	--	621
Hallucination Detection	TriviaQA (test)	AUC-ROC84.5	243
Hallucination Detection	TruthfulQA	AUC (ROC)0.53	178
Hallucination Detection	HaluEval (test)	AUC-ROC94.5	176
Hallucination Detection	NQ	AUC0.7369	154
Hallucination Detection	HaluEval	AUROC0.48	131
Hallucination Detection	TruthfulQA (test)	AUC-ROC88.9	112
Hallucination Detection	BioASQ	AUROC0.7787	104
Reasoning	MATH 500	Accuracy (%)77.1	94
Hallucination Detection	NQ (test)	AUC ROC93.6	91

Showing 10 of 61 rows

Other info

Code

Follow for update

@wizwand_team Discord