TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation

About

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user queries. These systems, however, remain susceptible to corpus poisoning attacks, which can severely impair the performance of LLMs. To address this challenge, we propose TrustRAG, a robust framework that systematically filters malicious and irrelevant content before it is retrieved for generation. Our approach employs a two-stage defense mechanism. The first stage implements a cluster filtering strategy to detect potential attack patterns. The second stage employs a self-assessment process that harnesses the internal capabilities of LLMs to detect malicious documents and resolve inconsistencies. TrustRAG provides a plug-and-play, training-free module that integrates seamlessly with any open- or closed-source language model. Extensive experiments demonstrate that TrustRAG delivers substantial improvements in retrieval accuracy, efficiency, and attack resistance.

Huichi Zhou, Kin-Hei Lee, Zhonghao Zhan, Yue Chen, Zhenhao Li, Zhaoyang Wang, Hamed Haddadi, Emine Yilmaz• 2025

Related benchmarks

Task	Dataset	Result
Retrieval Attack Defense	FiQA	ASR14	70
End-to-End Defense in RAG	HotpotQA	Attack Success Rate (ASR)24.5	69
End-to-End Defense in RAG	SciFact	ASR60	69
Long-form Question Answering	FAVA MixP (50% polluted)	VeriScore F1@k57.19	26
Long-form Question Answering	Biography FullP (100% polluted)	VeriScore F1@k23.42	26
Long-form Question Answering	FAVA FullP (100% polluted)	VeriScore F1@k42.64	26
Long-form Question Answering	AlpacaFact MixP (50% polluted)	VeriScore F1@k60.62	26
Long-form Question Answering	Biography MixP (50% polluted)	VeriScore F1@k48.03	26
Long-form Question Answering	LongFact MixP (50% polluted)	VeriScore F1@k65.66	26
Long-form Question Answering	LongFact FullP (100% polluted)	VeriScore F1@k49.01	26

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord