ReFilter: Improving Robustness of Retrieval-Augmented Generation via Gated Filter
About
Retrieval-augmented generation (RAG) has become a dominant paradigm for grounding large language models (LLMs) with external evidence in knowledge-intensive question answering. A core design choice is how to fuse retrieved samples into the LLMs, where existing internal fusion approaches broadly fall into query-based fusion, parametric fusion, and latent-based fusion. Despite their effectiveness at modest retrieval scales, these methods often fail to scale gracefully as the number of retrieved candidates k increases: Larger k improves evidence coverage, yet realistic top-k retrieval inevitably contains irrelevant or redundant content and increases the inference cost. To address these limitations, we propose ReFilter, a novel latent-based fusion framework that performs token-level filtering and fusion. ReFilter consists of three key components: a context encoder for encoding context features, a gated filter for weighting each token, and a token fusion module for integrating the weighted token feature into the LLM's hidden states. Our experiments across four general-domain QA benchmarks show that ReFilter consistently achieves the best average performance under both in-domain adaptation and out-of-domain transfer. ReFilter further generalizes to five biomedical QA benchmarks in zero-shot transfer without domain fine-tuning, reaching 70.01% average accuracy with Qwen2.5-14B-Instruct.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Medical Question Answering | MedMCQA | Accuracy61.77 | 253 | |
| Medical Question Answering | MedQA | Accuracy67.79 | 109 | |
| Medical Question Answering | PubMedQA | Accuracy56.8 | 45 | |
| Medical Question Answering | BioASQ | Accuracy80.74 | 20 | |
| Medical Question Answering | MMLU Med | Accuracy82.92 | 20 | |
| Question Answering | 2WQA, HPQA, PopQA, and CWQ (test) | 2WQA In-Domain0.3923 | 20 | |
| Question Answering | General-domain QA Benchmarks | 2WQA Score36.98 | 6 |