Shifting from Ranking to Set Selection for Retrieval Augmented Generation

About

Retrieval in Retrieval-Augmented Generation(RAG) must ensure that retrieved passages are not only individually relevant but also collectively form a comprehensive set. Existing approaches primarily rerank top-k passages based on their individual relevance, often failing to meet the information needs of complex queries in multi-hop question answering. In this work, we propose a set-wise passage selection approach and introduce SETR, which explicitly identifies the information requirements of a query through Chain-of-Thought reasoning and selects an optimal set of passages that collectively satisfy those requirements. Experiments on multi-hop RAG benchmarks show that SETR outperforms both proprietary LLM-based rerankers and open-source baselines in terms of answer correctness and retrieval quality, providing an effective and efficient alternative to traditional rerankers in RAG systems. The code is available at https://github.com/LGAI-Research/SetR

Dahyun Lee, Yongrae Jo, Haeju Park, Moontae Lee• 2025

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	2WikiMultihopQA	EM35.44	559
Multi-hop Question Answering	HotpotQA	F1 Score38.11	294
Multi-hop Question Answering	Multi-hop RAG	--	77
End-to-end Question Answering	HotpotQA (test val)	EM36.68	20
End-to-end Question Answering	2WikiMultiHopQA (test val)	EM35.44	20
End-to-end Question Answering	MuSiQue (test val)	EM10.79	20
End-to-end Question Answering	MultiHopRAG (test val)	Accuracy47.14	20
Information Retrieval	MultiHopRAG (test)	MRR@1057.42	11
Multi-hop Question Answering	HotpotQA fullwiki (val)	Exact Match (EM)39.2	7
Multi-hop Question Answering	MuSiQue fullwiki (val)	Exact Match (EM)12.3	7

Showing 10 of 10 rows

Other info

Code

Follow for update

@wizwand_team Discord