Cordon-MAS: Defending RAG against Knowledge Poisoning via Information-Flow Control
About
Retrieval-augmented generation (RAG) increasingly underpins high-stakes applications, yet remains vulnerable to Confundo-style poisoning where adversarially optimized documents manipulate generated outputs. Existing defenses assume that detecting poisoned evidence prevents harm. We show this assumption is incorrect: models exhibit a monitoring-control gap -- they can detect contradictions in retrieved evidence yet still act on poisoned claims. We introduce the Cordon Principle -- no agent capable of final synthesis may access untrusted natural-language evidence -- and realize it through CORDON-MAS, a compartmentalized framework that enforces this principle architecturally by separating evidence extraction, cross-source audit, and answer synthesis into agents with asymmetric memory privileges. Across five BEIR datasets, CORDON-MAS reduces attack success rate by 92.4\% relative to undefended RAG. This reframes RAG poisoning from a detection problem to an information-flow control problem.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Retrieval Attack Defense | FiQA | ASR4 | 70 | |
| End-to-End Defense in RAG | HotpotQA | Attack Success Rate (ASR)0.00e+0 | 69 | |
| End-to-End Defense in RAG | SciFact | ASR2 | 69 | |
| RAG Poisoning Attack Mitigation | NQ | -- | 15 | |
| Poison Defense ASR | MS Marco | ASR4.7 | 6 | |
| Question Answering | SciFact | Answerability Rate74 | 6 | |
| Question Answering | MS Marco | Answerability Rate0.79 | 6 | |
| Question Answering | FiQA | Answerability Rate58 | 6 | |
| Question Answering | NQ | Answerability Rate50 | 6 | |
| Question Answering | HotpotQA | Answerability Rate40 | 6 |