RAG-R1: Incentivizing the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism

About

Large Language Models (LLMs), despite their remarkable capabilities, are prone to generating hallucinated or outdated content due to their static internal knowledge. While Retrieval-Augmented Generation (RAG) integrated with Reinforcement Learning (RL) offers a solution, these methods are fundamentally constrained by a single-query mode, leading to prohibitive latency and inherent brittleness. To overcome these limitations, we introduce RAG-R1, a novel two-stage training framework centered around multi-query parallelism. Our framework enables LLMs to adaptively leverage internal and external knowledge during the reasoning process while transitioning from the single-query mode to multi-query parallelism. This architectural shift bolsters reasoning robustness while significantly reducing inference latency. Extensive experiments on seven question-answering benchmarks confirm the superiority of our method, which outperforms the strongest baseline by up to 13.7% and decreases inference time by 11.1%.

Zhiwen Tan, Jiaming Huang, Qintong Wu, Hongxuan Zhang, Chenyi Zhuang, Jinjie Gu• 2025

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	Average MuSiQue, 2wiki, HotpotQA	EM37.1	27
Question Answering	FRAMES	Accuracy45.6	14
Question Answering	MuSiQue	F1 Score29.7	9
Question Answering	FanOutQA	F1 Score28.2	9
Question Answering	MedBrowseComp	F1 Score19.2	9
Question Answering	Browsecomp	F15.9	9
Evidence Retrieval	FanOutQA	Evidence Coverage Rate53.2	6
Evidence Retrieval	MuSiQue	Evidence Coverage Rate35.9	6
Evidence Retrieval	FRAMES	Evidence Coverage Rate48	6

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord