Shifting from Ranking to Set Selection for Retrieval Augmented Generation
About
Retrieval in Retrieval-Augmented Generation(RAG) must ensure that retrieved passages are not only individually relevant but also collectively form a comprehensive set. Existing approaches primarily rerank top-k passages based on their individual relevance, often failing to meet the information needs of complex queries in multi-hop question answering. In this work, we propose a set-wise passage selection approach and introduce SETR, which explicitly identifies the information requirements of a query through Chain-of-Thought reasoning and selects an optimal set of passages that collectively satisfy those requirements. Experiments on multi-hop RAG benchmarks show that SETR outperforms both proprietary LLM-based rerankers and open-source baselines in terms of answer correctness and retrieval quality, providing an effective and efficient alternative to traditional rerankers in RAG systems. The code is available at https://github.com/LGAI-Research/SetR
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-hop Question Answering | 2WikiMultihopQA | EM35.44 | 278 | |
| Multi-hop Question Answering | HotpotQA | F1 Score38.11 | 221 | |
| Multi-hop Question Answering | Multi-hop RAG | -- | 65 | |
| End-to-end Question Answering | HotpotQA (test val) | EM36.68 | 20 | |
| End-to-end Question Answering | 2WikiMultiHopQA (test val) | EM35.44 | 20 | |
| End-to-end Question Answering | MuSiQue (test val) | EM10.79 | 20 | |
| End-to-end Question Answering | MultiHopRAG (test val) | Accuracy47.14 | 20 | |
| Information Retrieval | MultiHopRAG (test) | MRR@1057.42 | 11 |