R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

About

Existing Large Reasoning Models (LRMs) have shown the potential of reinforcement learning (RL) to enhance the complex reasoning capabilities of Large Language Models~(LLMs). While they achieve remarkable performance on challenging tasks such as mathematics and coding, they often rely on their internal knowledge to solve problems, which can be inadequate for time-sensitive or knowledge-intensive questions, leading to inaccuracies and hallucinations. To address this, we propose \textbf{R1-Searcher}, a novel two-stage outcome-based RL approach designed to enhance the search capabilities of LLMs. This method allows LLMs to autonomously invoke external search systems to access additional knowledge during the reasoning process. Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start. % effectively generalizing to out-of-domain datasets and supporting both Base and Instruct models. Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.

Huatong Song, Jinhao Jiang, Yingqian Min, Jie Chen, Zhipeng Chen, Wayne Xin Zhao, Lei Fang, Ji-Rong Wen• 2025

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	2WikiMultihopQA	EM51.3	559
Mathematical Reasoning	MATH	Accuracy67.6	535
Multi-hop Question Answering	HotpotQA (test)	F150.69	311
Multi-hop Question Answering	HotpotQA	--	294
Question Answering	2Wiki	EM46.99	241
Question Answering	Bamboogle	EM44	227
Multi-hop Question Answering	2WikiMultiHopQA (test)	EM27.34	226
Multi-hop Question Answering	2Wiki	Exact Match58.3	215
Multi-hop Question Answering	MuSiQue	EM18.6	209
Mathematical Reasoning	AMC 23	Accuracy37.5	198

Showing 10 of 157 rows

...

Other info

Follow for update

@wizwand_team Discord