Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

About

Effective information searching is essential for enhancing the reasoning and generation capabilities of large language models (LLMs). Recent research has explored using reinforcement learning (RL) to improve LLMs' search capabilities by interacting with live search engines in real-world environments. While these approaches show promising results, they face two major challenges: (1) Uncontrolled Document Quality: The quality of documents returned by search engines is often unpredictable, introducing noise and instability into the training process. (2) Prohibitively High API Costs: RL training requires frequent rollouts, potentially involving hundreds of thousands of search requests, which incur substantial API expenses and severely constrain scalability. To address these challenges, we introduce ZeroSearch, a novel RL framework that incentivizes the capabilities of LLMs to use a real search engine with simulated searches during training. Our approach begins with lightweight supervised fine-tuning to transform the LLM into a retrieval module capable of generating both useful and noisy documents in response to a query. During RL training, we employ a curriculum-based rollout strategy that incrementally degrades the quality of generated documents, progressively eliciting the model's reasoning ability by exposing it to increasingly challenging retrieval scenarios. Extensive experiments demonstrate that ZeroSearch effectively incentivizes the search capabilities of LLMs using a 3B LLM as the retrieval module. Remarkably, a 7B retrieval module achieves comparable performance to the real search engine, while a 14B retrieval module even surpasses it. Furthermore, it generalizes well across both base and instruction-tuned models of various parameter sizes and is compatible with a wide range of RL algorithms.

Hao Sun, Zile Qiao, Jiayan Guo, Xuanbo Fan, Yingyan Hou, Yong Jiang, Pengjun Xie, Yan Zhang, Fei Huang, Jingren Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Multi-hop Question Answering2WikiMultihopQA
EM39
387
Multi-hop Question AnsweringHotpotQA--
294
Multi-hop Question AnsweringHotpotQA (test)--
255
Multi-hop Question Answering2WikiMultiHopQA (test)
EM34.6
195
Question AnsweringPopQA--
186
Multi-hop Question AnsweringMuSiQue
EM20
185
Question AnsweringTriviaQA
EM60.1
182
Multi-hop Question Answering2Wiki--
152
Question Answering2Wiki--
152
Multi-hop Question AnsweringBamboogle
Exact Match27.8
128
Showing 10 of 86 rows
...

Other info

Follow for update