Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Cordon-MAS: Defending RAG against Knowledge Poisoning via Information-Flow Control

About

Retrieval-augmented generation (RAG) increasingly underpins high-stakes applications, yet remains vulnerable to Confundo-style poisoning where adversarially optimized documents manipulate generated outputs. Existing defenses assume that detecting poisoned evidence prevents harm. We show this assumption is incorrect: models exhibit a monitoring-control gap -- they can detect contradictions in retrieved evidence yet still act on poisoned claims. We introduce the Cordon Principle -- no agent capable of final synthesis may access untrusted natural-language evidence -- and realize it through CORDON-MAS, a compartmentalized framework that enforces this principle architecturally by separating evidence extraction, cross-source audit, and answer synthesis into agents with asymmetric memory privileges. Across five BEIR datasets, CORDON-MAS reduces attack success rate by 92.4\% relative to undefended RAG. This reframes RAG poisoning from a detection problem to an information-flow control problem.

Zhe Yu, Wenpeng Xing, Gaolei Li, Shuguang Xiong, Hongzhi Wang, Xuyang Teng, Meng Han• 2026

Related benchmarks

TaskDatasetResultRank
Retrieval Attack DefenseFiQA
ASR4
70
End-to-End Defense in RAGHotpotQA
Attack Success Rate (ASR)0.00e+0
69
End-to-End Defense in RAGSciFact
ASR2
69
RAG Poisoning Attack MitigationNQ--
15
Poison Defense ASRMS Marco
ASR4.7
6
Question AnsweringSciFact
Answerability Rate74
6
Question AnsweringMS Marco
Answerability Rate0.79
6
Question AnsweringFiQA
Answerability Rate58
6
Question AnsweringNQ
Answerability Rate50
6
Question AnsweringHotpotQA
Answerability Rate40
6
Showing 10 of 10 rows

Other info

Follow for update