R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning
About
Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the internal knowledge of the model. In this paper, we introduce R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage both internal and external knowledge sources. R1-Searcher++ employs a two-stage training strategy: an initial SFT Cold-start phase for preliminary format learning, followed by RL for Dynamic Knowledge Acquisition. The RL stage uses outcome-supervision to encourage exploration, incorporates a reward mechanism for internal knowledge utilization, and integrates a memorization mechanism to continuously assimilate retrieved information, thereby enriching the model's internal knowledge. By leveraging internal knowledge and external search engine, the model continuously improves its capabilities, enabling efficient retrieval-augmented reasoning. Our experiments demonstrate that R1-Searcher++ outperforms previous RAG and reasoning methods and achieves efficient retrieval. The code is available at https://github.com/RUCAIBox/R1-Searcher-plus.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Open-domain Question Answering | 2WikiMultiHopQA in-domain | F1 Score61.2 | 57 | |
| Open-domain Question Answering | MuSiQue (out-of-domain) | F133.8 | 57 | |
| Open-domain Question Answering | HotpotQA in-domain | F1 Score59 | 57 | |
| Open-domain QA | Bambogle v1 (out-of-domain) | F1 Score60.8 | 33 | |
| Open-domain Question Answering | Bamboogle (out-of-domain) | F160.8 | 24 | |
| Multi-hop Question Answering | HotpotQA | CEM64.2 | 12 | |
| Multi-hop Question Answering | Bamboogle | CEM (%)58.7 | 12 | |
| Multi-hop Question Answering | MuSiQue | CEM (%)32.3 | 12 | |
| Question Answering | PopQA | CEM (%)59 | 12 | |
| Multi-hop Question Answering | 2Wiki | CEM (%)63.2 | 12 |