RAG-R1: Incentivizing the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
About
Large Language Models (LLMs), despite their remarkable capabilities, are prone to generating hallucinated or outdated content due to their static internal knowledge. While Retrieval-Augmented Generation (RAG) integrated with Reinforcement Learning (RL) offers a solution, these methods are fundamentally constrained by a single-query mode, leading to prohibitive latency and inherent brittleness. To overcome these limitations, we introduce RAG-R1, a novel two-stage training framework centered around multi-query parallelism. Our framework enables LLMs to adaptively leverage internal and external knowledge during the reasoning process while transitioning from the single-query mode to multi-query parallelism. This architectural shift bolsters reasoning robustness while significantly reducing inference latency. Extensive experiments on seven question-answering benchmarks confirm the superiority of our method, which outperforms the strongest baseline by up to 13.7% and decreases inference time by 11.1%.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Question Answering | FRAMES | Accuracy45.6 | 14 | |
| Question Answering | MuSiQue | F1 Score29.7 | 9 | |
| Question Answering | FanOutQA | F1 Score28.2 | 9 | |
| Question Answering | MedBrowseComp | F1 Score19.2 | 9 | |
| Question Answering | Browsecomp | F15.9 | 9 | |
| Evidence Retrieval | FanOutQA | Evidence Coverage Rate53.2 | 6 | |
| Evidence Retrieval | MuSiQue | Evidence Coverage Rate35.9 | 6 | |
| Evidence Retrieval | FRAMES | Evidence Coverage Rate48 | 6 |