Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

About

Augmenting large language models (LLMs) with browsing tools substantially improves their potential as deep search agents to solve complex, real-world tasks. Yet, open LLMs still perform poorly in such settings due to limited long-horizon reasoning capacity with browsing tools and the lack of sufficiently difficult supervised data. To address these challenges, we present DeepDive to advance deep search agents. First, we propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs. Second, we apply end-to-end multi-turn reinforcement learning (RL) to enhance LLMs' long-horizon reasoning with deep search. To encourage diversity and reduce redundancy, we design a redundancy penalty that discourages repeated similar queries. Experiments show that DeepDive-32B achieves a new open-source competitive result on BrowseComp, outperforming WebSailor, DeepSeek-R1-Browse, and Search-o1. We demonstrate that multi-turn RL training improves deep search ability and significantly contributes to the performance improvements across multiple benchmarks. We observe that DeepDive enables test-time scaling of tool calls and parallel sampling. All datasets, models, and code are publicly available at https://github.com/THUDM/DeepDive.

Rui Lu, Zhenyu Hou, Zihan Wang, Hanchen Zhang, Xiao Liu, Yujiang Li, Shi Feng, Jie Tang, Yuxiao Dong• 2025

Related benchmarks

TaskDatasetResultRank
Deep ResearchBrowseComp-ZH (BC-zh) original (test)
Pass@129.7
45
Deep ResearchBrowsecomp
Pass@115.3
33
Deep searchBrowseComp-ZH (test)
Accuracy29.7
27
Deep searchBrowseComp (test)
Accuracy15.3
27
Deep searchXbench-DeepSearch (test)
Accuracy51.8
26
Deep-search QABrowseComp (test)
Pass@115.3
24
Deep-search QAXbench-DeepSearch (test)
Pass@151.8
24
Multi-step navigation and information locationBrowseComp English
Score15.3
22
Deep Information Search and Synthesisxbench DeepSearch
Score51.8
22
Multi-step navigation and information locationBrowseComp-ZH
Score29.7
21
Showing 10 of 17 rows

Other info

Follow for update