On the Risk of Misinformation Pollution with Large Language Models

About

In this paper, we comprehensively investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation and its subsequent impact on information-intensive applications, particularly Open-Domain Question Answering (ODQA) systems. We establish a threat model and simulate potential misuse scenarios, both unintentional and intentional, to assess the extent to which LLMs can be utilized to produce misinformation. Our study reveals that LLMs can act as effective misinformation generators, leading to a significant degradation in the performance of ODQA systems. To mitigate the harm caused by LLM-generated misinformation, we explore three defense strategies: prompting, misinformation detection, and majority voting. While initial results show promising trends for these defensive strategies, much more work needs to be done to address the challenge of misinformation pollution. Our work highlights the need for further research and interdisciplinary collaboration to address LLM-generated misinformation and to promote responsible use of LLMs.

Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, William Yang Wang• 2023

Related benchmarks

Task	Dataset	Result
Retrieval Attack Defense	Natural Questions (NQ)	--	99
RAG Poisoning Attack Mitigation	RQA-MC	ASR (PIA)64	15
Knowledge Poisoning Attack	FEVER k=10 (test)	Attack Success Rate (ASR)40	15
RAG Poisoning Attack Mitigation	NQ	ASR (PIA)10.8	15
RAG Poisoning Attack Mitigation	RQA	ASR (PIA)15	15
RAG Attack	Natural Questions, HotpotQA, and MS-MARCO Average	Average ASR88.333	8
Knowledge Poisoning Attack	Climate-FEVER k=10 (test)	ASR40	5
Retrieval of adversarial passages	HotpotQA	--	1
Retrieval of adversarial passages	MS Marco	--	1

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord