R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning

About

Large language models (LLMs) have notably progressed in multi-step and long-chain reasoning. However, extending their reasoning capabilities to encompass deep interactions with search remains a non-trivial challenge, as models often fail to identify optimal reasoning-search interaction trajectories, resulting in suboptimal responses. We propose R-Search, a novel reinforcement learning framework for Reasoning-Search integration, designed to enable LLMs to autonomously execute multi-step reasoning with deep search interaction, and learn optimal reasoning search interaction trajectories via multi-reward signals, improving response quality in complex logic- and knowledge-intensive tasks. R-Search guides the LLM to dynamically decide when to retrieve or reason, while globally integrating key evidence to enhance deep knowledge interaction between reasoning and search. During RL training, R-Search provides multi-stage, multi-type rewards to jointly optimize the reasoning-search trajectory. Experiments on seven datasets show that R-Search outperforms advanced RAG baselines by up to 32.2% (in-domain) and 25.1% (out-of-domain). The code and data are available at https://github.com/QingFei1/R-Search.

Qingfei Zhao, Ruobing Wang, Dingling Xu, Daren Zha, Limin Liu• 2025

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	2WikiMultihopQA	EM32	559
Multi-hop Question Answering	2Wiki	Exact Match49.8	215
Multi-hop Question Answering	HotpotQA	Exact Match (EM)30.7	150
Multi-hop Question Answering	Bamboogle	Exact Match20.8	128
Question Answering	NQ (Natural Questions)	EM31.9	70
Multi-hop Question Answering	MuSiQue	Exact Match (EM)11.9	58
General Question Answering	TriviaQA	Exact Match54.1	54
General Question Answering	PopQA	EM36.5	51
Multi-hop Question Answering	HotpotQA offline Wiki-18 (test val)	EM39.1	24
Multi-hop Question Answering	2WikiMultiHopQA online Google Search API (test val)	Exact Match49.6	24

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord