Rethinking the Reranker: Boundary-Aware Evidence Selection for Robust Retrieval-Augmented Generation
About
Retrieval-Augmented Generation (RAG) systems remain brittle under realistic retrieval noise, even when the required evidence appears in the top-K results. A key reason is that retrievers and rerankers optimize solely for relevance, often selecting either trivial, answer-revealing passages or evidence that lacks the critical information required to answer the question, without considering whether the evidence is suitable for the generator. We propose BAR-RAG, which reframes the reranker as a boundary-aware evidence selector that targets the generator's Goldilocks Zone -- evidence that is neither trivially easy nor fundamentally unanswerable for the generator, but is challenging yet sufficient for inference and thus provides the strongest learning signal. BAR-RAG trains the selector with reinforcement learning using generator feedback, and adopts a two-stage pipeline that fine-tunes the generator under the induced evidence distribution to mitigate the distribution mismatch between training and inference. Experiments on knowledge-intensive question answering benchmarks show that BAR-RAG consistently improves end-to-end performance under noisy retrieval, achieving an average gain of 10.3 percent over strong RAG and reranking baselines while substantially improving robustness. Code is publicly avaliable at https://github.com/GasolSun36/BAR-RAG.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-hop Question Answering | 2WikiMultihopQA | EM33 | 278 | |
| Question Answering | PopQA | -- | 186 | |
| Multi-hop Question Answering | Bamboogle | Exact Match39.6 | 97 | |
| Question Answering | MuSiQue | EM12 | 84 | |
| Question Answering | PopQA | EM48.6 | 80 | |
| Question Answering | 2WikiMultihopQA | EM33 | 73 | |
| Question Answering | Bamboogle | EM39.6 | 62 | |
| Multi-hop Question Answering | HotpotQA | Exact Match (EM)41.2 | 56 | |
| Question Answering | NQ (Natural Questions) | EM49.5 | 55 | |
| Multi-hop Question Answering | MuSiQue | Exact Match (EM)12.5 | 27 |