Certifiably Robust RAG against Retrieval Corruption

About

Retrieval-augmented generation (RAG) is susceptible to retrieval corruption attacks, where malicious passages injected into retrieval results can lead to inaccurate model responses. We propose RobustRAG, the first defense framework with certifiable robustness against retrieval corruption attacks. The key insight of RobustRAG is an isolate-then-aggregate strategy: we isolate passages into disjoint groups, generate LLM responses based on the concatenated passages from each isolated group, and then securely aggregate these responses for a robust output. To instantiate RobustRAG, we design keyword-based and decoding-based algorithms for securely aggregating unstructured text responses. Notably, RobustRAG achieves certifiable robustness: for certain queries in our evaluation datasets, we can formally certify non-trivial lower bounds on response quality -- even against an adaptive attacker with full knowledge of the defense and the ability to arbitrarily inject a bounded number of malicious passages. We evaluate RobustRAG on the tasks of open-domain question-answering and free-form long text generation and demonstrate its effectiveness across three datasets and three LLMs.

Chong Xiang, Tong Wu, Zexuan Zhong, David Wagner, Danqi Chen, Prateek Mittal• 2024

Related benchmarks

Task	Dataset	Result
Question Answering	TriviaQA	Accuracy78.1	238
Question Answering	PopQA	Accuracy37.1	186
Question Answering	RQA	ASR17	130
Question Answering	NQ	Accuracy47.8	123
Question Answering	NQ	Accuracy62	113
Retrieval Question Answering	RQA	Accuracy71	72
Question Answering	BioASQ	Accuracy56.3	72
Retrieval Attack Defense	FiQA	ASR10	70
End-to-End Defense in RAG	HotpotQA	Attack Success Rate (ASR)6.1	69
End-to-End Defense in RAG	SciFact	ASR44	69

Showing 10 of 63 rows

Other info

Follow for update

@wizwand_team Discord