Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Certifiably Robust RAG against Retrieval Corruption

About

Retrieval-augmented generation (RAG) is susceptible to retrieval corruption attacks, where malicious passages injected into retrieval results can lead to inaccurate model responses. We propose RobustRAG, the first defense framework with certifiable robustness against retrieval corruption attacks. The key insight of RobustRAG is an isolate-then-aggregate strategy: we isolate passages into disjoint groups, generate LLM responses based on the concatenated passages from each isolated group, and then securely aggregate these responses for a robust output. To instantiate RobustRAG, we design keyword-based and decoding-based algorithms for securely aggregating unstructured text responses. Notably, RobustRAG achieves certifiable robustness: for certain queries in our evaluation datasets, we can formally certify non-trivial lower bounds on response quality -- even against an adaptive attacker with full knowledge of the defense and the ability to arbitrarily inject a bounded number of malicious passages. We evaluate RobustRAG on the tasks of open-domain question-answering and free-form long text generation and demonstrate its effectiveness across three datasets and three LLMs.

Chong Xiang, Tong Wu, Zexuan Zhong, David Wagner, Danqi Chen, Prateek Mittal• 2024

Related benchmarks

TaskDatasetResultRank
Question AnsweringTriviaQA
Accuracy78.1
238
Question AnsweringPopQA
Accuracy37.1
186
Question AnsweringNQ
Accuracy47.8
123
Question AnsweringBioASQ
Accuracy56.3
72
Long-form generationBio
LLM-Judge Score71.2
59
Question AnsweringOverall NQ, TriviaQA, BioASQ, PopQA
Accuracy0.565
32
Short-answer QANQ
Accuracy62
11
Short-answer QARQA
Accuracy71
8
Question AnsweringNQ Normal
F1 Score26.58
8
Question AnsweringTrivialQA Normal
F145.25
8
Showing 10 of 28 rows

Other info

Follow for update