Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

About

Efficiently acquiring external knowledge and up-to-date information is essential for effective reasoning and text generation in large language models (LLMs). Prompting advanced LLMs with reasoning capabilities to use search engines during inference is often suboptimal, as the LLM might not fully possess the capability on how to interact optimally with the search engine. This paper introduces Search-R1, an extension of reinforcement learning (RL) for reasoning frameworks where the LLM learns to autonomously generate (multiple) search queries during step-by-step reasoning with real-time retrieval. Search-R1 optimizes LLM reasoning trajectories with multi-turn search interactions, leveraging retrieved token masking for stable RL training and a simple outcome-based reward function. Experiments on seven question-answering datasets show that Search-R1 improves performance by 41% (Qwen2.5-7B) and 20% (Qwen2.5-3B) over various RAG baselines under the same setting. This paper further provides empirical insights into RL optimization methods, LLM choices, and response length dynamics in retrieval-augmented reasoning. The code and model checkpoints are available at https://github.com/PeterGriffinJin/Search-R1.

Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, Jiawei Han• 2025

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	2WikiMultihopQA	EM72.1	559
Multi-hop Question Answering	HotpotQA (test)	F154.18	311
Multi-hop Question Answering	HotpotQA	--	294
Question Answering	2Wiki	EM51.8	241
Mathematical Reasoning	MATH 500	--	236
Question Answering	Bamboogle	EM50.4	227
Multi-hop Question Answering	2WikiMultiHopQA (test)	EM73.6	226
Multi-hop Question Answering	2Wiki	Exact Match54.6	215
Multi-hop Question Answering	MuSiQue	EM28.8	209
Mathematical Reasoning	AIME 25	Accuracy7.3	201

Showing 10 of 514 rows

...

Other info

Follow for update

@wizwand_team Discord