Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

About

Effective information searching is essential for enhancing the reasoning and generation capabilities of large language models (LLMs). Recent research has explored using reinforcement learning (RL) to improve LLMs' search capabilities by interacting with live search engines in real-world environments. While these approaches show promising results, they face two major challenges: (1) Uncontrolled Document Quality: The quality of documents returned by search engines is often unpredictable, introducing noise and instability into the training process. (2) Prohibitively High API Costs: RL training requires frequent rollouts, potentially involving hundreds of thousands of search requests, which incur substantial API expenses and severely constrain scalability. To address these challenges, we introduce ZeroSearch, a novel RL framework that incentivizes the capabilities of LLMs to use a real search engine with simulated searches during training. Our approach begins with lightweight supervised fine-tuning to transform the LLM into a retrieval module capable of generating both useful and noisy documents in response to a query. During RL training, we employ a curriculum-based rollout strategy that incrementally degrades the quality of generated documents, progressively eliciting the model's reasoning ability by exposing it to increasingly challenging retrieval scenarios. Extensive experiments demonstrate that ZeroSearch effectively incentivizes the search capabilities of LLMs using a 3B LLM as the retrieval module. Remarkably, a 7B retrieval module achieves comparable performance to the real search engine, while a 14B retrieval module even surpasses it. Furthermore, it generalizes well across both base and instruction-tuned models of various parameter sizes and is compatible with a wide range of RL algorithms.

Hao Sun, Zile Qiao, Jiayan Guo, Xuanbo Fan, Yingyan Hou, Yong Jiang, Pengjun Xie, Yan Zhang, Fei Huang, Jingren Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Multi-hop Question Answering2WikiMultihopQA
EM39
278
Multi-hop Question AnsweringHotpotQA--
221
Question AnsweringPopQA--
186
Multi-hop Question AnsweringMuSiQue
EM20
106
Multi-hop Question AnsweringBamboogle
Exact Match27.8
97
Question Answering2Wiki--
75
Single-hop Question AnsweringTriviaQA
EM66.2
62
Question AnsweringNQ
EM43.8
57
Single-hop Question AnsweringPopQA
EM58.2
55
Question AnsweringGeneral QA NQ, TriviaQA, PopQA (test)
Overall Average Score34.5
49
Showing 10 of 47 rows

Other info

Follow for update