Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

About

Large language models (large LMs) are susceptible to producing text that contains hallucinated content. An important instance of this problem is self-contradiction, where the LM generates two contradictory sentences within the same context. In this work, we present a comprehensive investigation into self-contradiction for various instruction-tuned LMs, covering evaluation, detection, and mitigation. Our primary evaluation task is open-domain text generation, but we also demonstrate the applicability of our approach to shorter question answering. Our analysis reveals the prevalence of self-contradictions, e.g., in 17.7% of all sentences produced by ChatGPT. We then propose a novel prompting-based framework designed to effectively detect and mitigate self-contradictions. Our detector achieves high accuracy, e.g., around 80% F1 score when prompting ChatGPT. The mitigation algorithm iteratively refines the generated text to remove contradictory information while preserving text fluency and informativeness. Importantly, our entire framework is applicable to black-box LMs and does not require retrieval of external knowledge. Rather, our method complements retrieval-based methods, as a large portion of self-contradictions (e.g., 35.2% for ChatGPT) cannot be verified using online text. Our approach is practically effective and has been released as a push-button tool to benefit the public at https://chatprotect.ai/.

Niels M\"undler, Jingxuan He, Slobodan Jenko, Martin Vechev• 2023

Related benchmarks

TaskDatasetResultRank
Hallucination DetectionHaluEval
F1 Score51.4
75
Hallucination Detection (Self-contradictory Hallucinations)ChatProtect SC
F1 Score83.8
12
Hallucination Detection (Math Word Problems)UMWP
F1 Score74
12
Hallucination Detection (Dialogue)HaluEval DA
F1 Score72
12
Hallucination DetectionHaluEval Sum
F1 Score36.7
12
Math Word ProblemsMWPs
R Score80.5
10
Scientific ClaimsSC
R Score79.3
10
Dialogue AnalysisDA
R Metric79.5
10
SummarizationSUM
ROUGE Score (R)23
10
Showing 9 of 9 rows

Other info

Follow for update