Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

About

Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the internal knowledge of the model. In this paper, we introduce R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage both internal and external knowledge sources. R1-Searcher++ employs a two-stage training strategy: an initial SFT Cold-start phase for preliminary format learning, followed by RL for Dynamic Knowledge Acquisition. The RL stage uses outcome-supervision to encourage exploration, incorporates a reward mechanism for internal knowledge utilization, and integrates a memorization mechanism to continuously assimilate retrieved information, thereby enriching the model's internal knowledge. By leveraging internal knowledge and external search engine, the model continuously improves its capabilities, enabling efficient retrieval-augmented reasoning. Our experiments demonstrate that R1-Searcher++ outperforms previous RAG and reasoning methods and achieves efficient retrieval. The code is available at https://github.com/RUCAIBox/R1-Searcher-plus.

Huatong Song, Jinhao Jiang, Wenqing Tian, Zhipeng Chen, Yuhuan Wu, Jiahao Zhao, Yingqian Min, Wayne Xin Zhao, Lei Fang, Ji-Rong Wen• 2025

Related benchmarks

TaskDatasetResultRank
Open-domain Question Answering2WikiMultiHopQA in-domain
F1 Score61.2
57
Open-domain Question AnsweringMuSiQue (out-of-domain)
F133.8
57
Open-domain Question AnsweringHotpotQA in-domain
F1 Score59
57
Open-domain QABambogle v1 (out-of-domain)
F1 Score60.8
33
Open-domain Question AnsweringBamboogle (out-of-domain)
F160.8
24
Multi-hop Question AnsweringHotpotQA
CEM64.2
12
Multi-hop Question AnsweringBamboogle
CEM (%)58.7
12
Multi-hop Question AnsweringMuSiQue
CEM (%)32.3
12
Question AnsweringPopQA
CEM (%)59
12
Multi-hop Question Answering2Wiki
CEM (%)63.2
12
Showing 10 of 12 rows

Other info

Follow for update