Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RAG with Differential Privacy

About

Retrieval-Augmented Generation (RAG) has emerged as the dominant technique to provide \emph{Large Language Models} (LLM) with fresh and relevant context, mitigating the risk of hallucinations and improving the overall quality of responses in environments with large and fast moving knowledge bases. However, the integration of external documents into the generation process raises significant privacy concerns. Indeed, when added to a prompt, it is not possible to guarantee a response will not inadvertently expose confidential data, leading to potential breaches of privacy and ethical dilemmas. This paper explores a practical solution to this problem suitable to general knowledge extraction from personal data. It shows \emph{differentially private token generation} is a viable approach to private RAG.

Nicolas Grislain• 2024

Related benchmarks

TaskDatasetResultRank
Question AnsweringSearchQA
Accuracy85.1
30
Question AnsweringMovieLens
Accuracy56.8
18
Question AnsweringMedical Synth
Accuracy67.06
18
Retrieval-Augmented GenerationMovieLens
Accuracy56.8
9
Retrieval-Augmented GenerationMedical Synth
Accuracy67.06
9
Retrieval-Augmented GenerationSearchQA
Accuracy85.1
9
Membership Inference Attack DefenseRAG Evaluation Datasets NQ, PubMedQA, TriviaQA
Contextual Recall37.9
7
Showing 7 of 7 rows

Other info

Follow for update