DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
About
Augmenting large language models (LLMs) with browsing tools substantially improves their potential as deep search agents to solve complex, real-world tasks. Yet, open LLMs still perform poorly in such settings due to limited long-horizon reasoning capacity with browsing tools and the lack of sufficiently difficult supervised data. To address these challenges, we present DeepDive to advance deep search agents. First, we propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs. Second, we apply end-to-end multi-turn reinforcement learning (RL) to enhance LLMs' long-horizon reasoning with deep search. To encourage diversity and reduce redundancy, we design a redundancy penalty that discourages repeated similar queries. Experiments show that DeepDive-32B achieves a new open-source competitive result on BrowseComp, outperforming WebSailor, DeepSeek-R1-Browse, and Search-o1. We demonstrate that multi-turn RL training improves deep search ability and significantly contributes to the performance improvements across multiple benchmarks. We observe that DeepDive enables test-time scaling of tool calls and parallel sampling. All datasets, models, and code are publicly available at https://github.com/THUDM/DeepDive.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Deep Research | BrowseComp-ZH (BC-zh) original (test) | Pass@129.7 | 45 | |
| Deep search | BrowseComp-ZH (test) | Accuracy29.7 | 27 | |
| Deep search | BrowseComp (test) | Accuracy15.3 | 27 | |
| Deep search | Xbench-DeepSearch (test) | Accuracy51.8 | 26 | |
| Deep-search QA | BrowseComp (test) | Pass@115.3 | 24 | |
| Deep-search QA | Xbench-DeepSearch (test) | Pass@151.8 | 24 | |
| Information Seeking | BrowseComp standard (full) | Pass@114.8 | 20 | |
| Information Seeking | Browsecomp | Success Rate15.3 | 19 | |
| Information Seeking | BrowseComp Chinese (full) | Pass@125.6 | 19 | |
| Information Seeking | xBench-DS | Success Rate51.8 | 18 |