Learning to Filter Context for Retrieval-Augmented Generation
About
On-the-fly retrieval of relevant knowledge has proven an essential element of reliable systems for tasks such as open-domain question answering and fact verification. However, because retrieval systems are not perfect, generation models are required to generate outputs given partially or entirely irrelevant passages. This can cause over- or under-reliance on context, and result in problems in the generated output such as hallucinations. To alleviate these problems, we propose FILCO, a method that improves the quality of the context provided to the generator by (1) identifying useful context based on lexical and information-theoretic approaches, and (2) training context filtering models that can filter retrieved contexts at test time. We experiment on six knowledge-intensive tasks with FLAN-T5 and LLaMa2, and demonstrate that our method outperforms existing approaches on extractive question answering (QA), complex multi-hop and long-form QA, fact verification, and dialog generation tasks. FILCO effectively improves the quality of context, whether or not it supports the canonical output.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-hop Question Answering | HotpotQA (test) | F158.15 | 255 | |
| Multi-hop Question Answering | 2Wiki | Exact Match27.2 | 152 | |
| Open Question Answering | Natural Questions (NQ) (test) | Exact Match (EM)27.3 | 134 | |
| Multi-hop Question Answering | Multi-hop RAG | F137.2 | 77 | |
| Question Answering | NQ | EM45.8 | 69 | |
| Retrieval | HotpotQA | R@588.4 | 36 | |
| Retrieval | NQ | R@570.1 | 19 | |
| Retrieval | PopQA | R@558.2 | 19 | |
| Retrieval | 2Wiki | Recall@576 | 19 | |
| Open-domain Question Answering | Natural Questions (test) | EM44.79 | 18 |