Certifiably Robust RAG against Retrieval Corruption
About
Retrieval-augmented generation (RAG) is susceptible to retrieval corruption attacks, where malicious passages injected into retrieval results can lead to inaccurate model responses. We propose RobustRAG, the first defense framework with certifiable robustness against retrieval corruption attacks. The key insight of RobustRAG is an isolate-then-aggregate strategy: we isolate passages into disjoint groups, generate LLM responses based on the concatenated passages from each isolated group, and then securely aggregate these responses for a robust output. To instantiate RobustRAG, we design keyword-based and decoding-based algorithms for securely aggregating unstructured text responses. Notably, RobustRAG achieves certifiable robustness: for certain queries in our evaluation datasets, we can formally certify non-trivial lower bounds on response quality -- even against an adaptive attacker with full knowledge of the defense and the ability to arbitrarily inject a bounded number of malicious passages. We evaluate RobustRAG on the tasks of open-domain question-answering and free-form long text generation and demonstrate its effectiveness across three datasets and three LLMs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Question Answering | TriviaQA | Accuracy78.1 | 238 | |
| Question Answering | PopQA | Accuracy37.1 | 186 | |
| Question Answering | NQ | Accuracy47.8 | 123 | |
| Question Answering | BioASQ | Accuracy56.3 | 72 | |
| Long-form generation | Bio | LLM-Judge Score71.2 | 59 | |
| Question Answering | Overall NQ, TriviaQA, BioASQ, PopQA | Accuracy0.565 | 32 | |
| Short-answer QA | NQ | Accuracy62 | 11 | |
| Short-answer QA | RQA | Accuracy71 | 8 | |
| Question Answering | NQ Normal | F1 Score26.58 | 8 | |
| Question Answering | TrivialQA Normal | F145.25 | 8 |