Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training

About

Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges. However, inappropriate retrieved passages can potentially hinder the LLMs' capacity to generate comprehensive and high-quality responses. Prior RAG studies on the robustness of retrieval noises often confine themselves to a limited set of noise types, deviating from real-world retrieval environments and limiting practical applicability. In this study, we initially investigate retrieval noises and categorize them into three distinct types, reflecting real-world environments. We analyze the impact of these various retrieval noises on the robustness of LLMs. Subsequently, we propose a novel RAG approach known as Retrieval-augmented Adaptive Adversarial Training (RAAT). RAAT leverages adaptive adversarial training to dynamically adjust the model's training process in response to retrieval noises. Concurrently, it employs multi-task learning to ensure the model's capacity to internally recognize noisy contexts. Extensive experiments demonstrate that the LLaMA-2 7B model trained using RAAT exhibits significant improvements in F1 and EM scores under diverse noise conditions. For reproducibility, we release our code and data at: https://github.com/calubkk/RAAT.

Feiteng Fang, Yuelin Bai, Shiwen Ni, Min Yang, Xiaojun Chen, Ruifeng Xu• 2024

Related benchmarks

TaskDatasetResultRank
Multi-hop Question AnsweringHotpotQA
F1 Score33.3
221
Question AnsweringPubMedQA (test)
Accuracy46.8
81
Multi-hop Question AnsweringHotpotQA
SubEM33.58
40
Open-domain Question AnsweringNaturalQuestions (NQ)
SubEM50.12
40
Open-domain Question AnsweringTriviaQA
SubEM68.54
40
Question AnsweringNQ, TriviaQA, and WebQ (test)
Accuracy46.2
21
Retrieval-Augmented GenerationRAG-Bench
F1 (Golden Only)87.15
11
Retrieval-Augmented GenerationPubMedQA
Accuracy46.6
8
Question AnsweringConFiQA-QA counterfactual contexts
Accuracy43.5
7
Retrieval-Augmented GenerationBioASQ
Accuracy64.9
5
Showing 10 of 11 rows

Other info

Code

Follow for update