Learning to Filter Context for Retrieval-Augmented Generation
About
On-the-fly retrieval of relevant knowledge has proven an essential element of reliable systems for tasks such as open-domain question answering and fact verification. However, because retrieval systems are not perfect, generation models are required to generate outputs given partially or entirely irrelevant passages. This can cause over- or under-reliance on context, and result in problems in the generated output such as hallucinations. To alleviate these problems, we propose FILCO, a method that improves the quality of the context provided to the generator by (1) identifying useful context based on lexical and information-theoretic approaches, and (2) training context filtering models that can filter retrieved contexts at test time. We experiment on six knowledge-intensive tasks with FLAN-T5 and LLaMa2, and demonstrate that our method outperforms existing approaches on extractive question answering (QA), complex multi-hop and long-form QA, fact verification, and dialog generation tasks. FILCO effectively improves the quality of context, whether or not it supports the canonical output.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-hop Question Answering | HotpotQA (test) | F158.15 | 198 | |
| Open Question Answering | Natural Questions (NQ) (test) | Exact Match (EM)27.3 | 134 | |
| Open-domain Question Answering | Natural Questions (test) | EM44.79 | 18 | |
| Open-domain QA | TriviaQA (TQA) (test) | EM60.4 | 10 | |
| Open-domain QA | HotpotQA (HQA) (test) | Exact Match0.239 | 10 | |
| Multi-hop Question Answering | MuSiQue (OOD) | EM8.36 | 6 | |
| Multi-hop Question Answering | 2WikiQA OOD evaluation | EM27.5 | 6 |