Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

About

Large language models (large LMs) are susceptible to producing text that contains hallucinated content. An important instance of this problem is self-contradiction, where the LM generates two contradictory sentences within the same context. In this work, we present a comprehensive investigation into self-contradiction for various instruction-tuned LMs, covering evaluation, detection, and mitigation. Our primary evaluation task is open-domain text generation, but we also demonstrate the applicability of our approach to shorter question answering. Our analysis reveals the prevalence of self-contradictions, e.g., in 17.7% of all sentences produced by ChatGPT. We then propose a novel prompting-based framework designed to effectively detect and mitigate self-contradictions. Our detector achieves high accuracy, e.g., around 80% F1 score when prompting ChatGPT. The mitigation algorithm iteratively refines the generated text to remove contradictory information while preserving text fluency and informativeness. Importantly, our entire framework is applicable to black-box LMs and does not require retrieval of external knowledge. Rather, our method complements retrieval-based methods, as a large portion of self-contradictions (e.g., 35.2% for ChatGPT) cannot be verified using online text. Our approach is practically effective and has been released as a push-button tool to benefit the public at https://chatprotect.ai/.

Niels M\"undler, Jingxuan He, Slobodan Jenko, Martin Vechev• 2023

Related benchmarks

Task	Dataset	Result
Hallucination Detection	HaluEval	--	131
Hallucination Detection (Self-contradictory Hallucinations)	ChatProtect SC	F1 Score83.8	12
Hallucination Detection (Math Word Problems)	UMWP	F1 Score74	12
Hallucination Detection (Dialogue)	HaluEval DA	F1 Score72	12
Hallucination Detection	HaluEval Sum	F1 Score36.7	12
Math Word Problems	MWPs	R Score80.5	10
Scientific Claims	SC	R Score79.3	10
Dialogue Analysis	DA	R Metric79.5	10
Summarization	SUM	ROUGE Score (R)23	10

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord