Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RAG-R1: Incentivizing the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism

About

Large Language Models (LLMs), despite their remarkable capabilities, are prone to generating hallucinated or outdated content due to their static internal knowledge. While Retrieval-Augmented Generation (RAG) integrated with Reinforcement Learning (RL) offers a solution, these methods are fundamentally constrained by a single-query mode, leading to prohibitive latency and inherent brittleness. To overcome these limitations, we introduce RAG-R1, a novel two-stage training framework centered around multi-query parallelism. Our framework enables LLMs to adaptively leverage internal and external knowledge during the reasoning process while transitioning from the single-query mode to multi-query parallelism. This architectural shift bolsters reasoning robustness while significantly reducing inference latency. Extensive experiments on seven question-answering benchmarks confirm the superiority of our method, which outperforms the strongest baseline by up to 13.7% and decreases inference time by 11.1%.

Zhiwen Tan, Jiaming Huang, Qintong Wu, Hongxuan Zhang, Chenyi Zhuang, Jinjie Gu• 2025

Related benchmarks

TaskDatasetResultRank
Question AnsweringFRAMES
Accuracy45.6
14
Question AnsweringMuSiQue
F1 Score29.7
9
Question AnsweringFanOutQA
F1 Score28.2
9
Question AnsweringMedBrowseComp
F1 Score19.2
9
Question AnsweringBrowsecomp
F15.9
9
Evidence RetrievalFanOutQA
Evidence Coverage Rate53.2
6
Evidence RetrievalMuSiQue
Evidence Coverage Rate35.9
6
Evidence RetrievalFRAMES
Evidence Coverage Rate48
6
Showing 8 of 8 rows

Other info

Follow for update