Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Facade of Truth: Uncovering and Mitigating LLM Susceptibility to Deceptive Evidence

About

To reliably assist human decision-making, LLMs must maintain factual internal beliefs against misleading injections. While current models resist explicit misinformation, we uncover a fundamental vulnerability to sophisticated, hard-to-falsify evidence. To systematically probe this weakness, we introduce MisBelief, a framework that generates misleading evidence via collaborative, multi-round interactions among multi-role LLMs. This process mimics subtle, defeasible reasoning and progressive refinement to create logically persuasive yet factually deceptive claims. Using MisBelief, we generate 4,800 instances across three difficulty levels to evaluate 7 representative LLMs. Results indicate that while models are robust to direct misinformation, they are highly sensitive to this refined evidence: belief scores in falsehoods increase by an average of 93.0\%, fundamentally compromising downstream recommendations. To address this, we propose Deceptive Intent Shielding (DIS), a governance mechanism that provides an early warning signal by inferring the deceptive intent behind evidence. Empirical results demonstrate that DIS consistently mitigates belief shifts and promotes more cautious evidence evaluation.

Herun Wan, Jiaying Wu, Minnan Luo, Fanxiao Li, Zhi Zeng, Min-Yen Kan• 2026

Related benchmarks

TaskDatasetResultRank
Belief assessment on misinformationMISBELIEF Overall Easy--
28
Belief assessment on misinformationMISBELIEF Overall (Medium)--
28
Belief assessment on misinformationMISBELIEF Overall (Hard)--
28
Belief assessment on misinformationMISBELIEF Health domain Easy split--
28
Belief assessment on misinformationMISBELIEF Health domain (Medium)--
28
Belief assessment on misinformationMISBELIEF Health domain Hard--
28
Belief assessment on misinformationMISBELIEF Sports domain Easy split--
28
Belief assessment on misinformationMISBELIEF Sports domain Medium--
28
Belief assessment on misinformationMISBELIEF Sports domain (Hard split)--
28
Misinformation belief assessmentReal-world misinformation--
28
Showing 10 of 16 rows

Other info

Follow for update