Cordon-MAS: Defending RAG against Knowledge Poisoning via Information-Flow Control

About

Retrieval-augmented generation (RAG) increasingly underpins high-stakes applications, yet remains vulnerable to Confundo-style poisoning where adversarially optimized documents manipulate generated outputs. Existing defenses assume that detecting poisoned evidence prevents harm. We show this assumption is incorrect: models exhibit a monitoring-control gap -- they can detect contradictions in retrieved evidence yet still act on poisoned claims. We introduce the Cordon Principle -- no agent capable of final synthesis may access untrusted natural-language evidence -- and realize it through CORDON-MAS, a compartmentalized framework that enforces this principle architecturally by separating evidence extraction, cross-source audit, and answer synthesis into agents with asymmetric memory privileges. Across five BEIR datasets, CORDON-MAS reduces attack success rate by 92.4\% relative to undefended RAG. This reframes RAG poisoning from a detection problem to an information-flow control problem.

Zhe Yu, Wenpeng Xing, Gaolei Li, Shuguang Xiong, Hongzhi Wang, Xuyang Teng, Meng Han• 2026

Related benchmarks

Task	Dataset	Result
Retrieval Attack Defense	FiQA	ASR4	70
End-to-End Defense in RAG	HotpotQA	Attack Success Rate (ASR)0.00e+0	69
End-to-End Defense in RAG	SciFact	ASR2	69
RAG Poisoning Attack Mitigation	NQ	--	15
Poison Defense ASR	MS Marco	ASR4.7	6
Question Answering	SciFact	Answerability Rate74	6
Question Answering	MS Marco	Answerability Rate0.79	6
Question Answering	FiQA	Answerability Rate58	6
Question Answering	NQ	Answerability Rate50	6
Question Answering	HotpotQA	Answerability Rate40	6

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord