Chain-of-Retrieval Augmented Generation

About

This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer. Conventional RAG methods usually perform a single retrieval step before the generation process, which limits their effectiveness in addressing complex queries due to imperfect retrieval results. In contrast, our proposed method, CoRAG (Chain-of-Retrieval Augmented Generation), allows the model to dynamically reformulate the query based on the evolving state. To train CoRAG effectively, we utilize rejection sampling to automatically generate intermediate retrieval chains, thereby augmenting existing RAG datasets that only provide the correct final answer. At test time, we propose various decoding strategies to scale the model's test-time compute by controlling the length and number of sampled retrieval chains. Experimental results across multiple benchmarks validate the efficacy of CoRAG, particularly in multi-hop question answering tasks, where we observe more than 10 points improvement in EM score compared to strong baselines. On the KILT benchmark, CoRAG establishes a new state-of-the-art performance across a diverse range of knowledge-intensive tasks. Furthermore, we offer comprehensive analyses to understand the scaling behavior of CoRAG, laying the groundwork for future research aimed at developing factual and grounded foundation models.

Liang Wang, Haonan Chen, Nan Yang, Xiaolong Huang, Zhicheng Dou, Furu Wei• 2025

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	2WikiMultihopQA	--	559
Multi-hop Question Answering	HotpotQA (test)	F152.85	311
Multi-hop Question Answering	Bamboogle (test)	EM31.61	98
Multi-hop Question Answering	2WikiQA (test)	F146.62	71
Claim Verification	HoVer (test)	Accuracy40.82	31
Multi-hop Question Answering	2Wiki	MBE59	17
Multi-hop Question Answering	HotpotQA	MBE58.2	17
Multi-hop Retrieval	HotpotQA	Latency (s)0.1486	15
Ambiguous Question Answering	AMBIGQA (test)	Accuracy32.1	13
Multi-hop Question Answering	MuSiQue	Recall54	6

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord