DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

About

Augmenting large language models (LLMs) with browsing tools substantially improves their potential as deep search agents to solve complex, real-world tasks. Yet, open LLMs still perform poorly in such settings due to limited long-horizon reasoning capacity with browsing tools and the lack of sufficiently difficult supervised data. To address these challenges, we present DeepDive to advance deep search agents. First, we propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs. Second, we apply end-to-end multi-turn reinforcement learning (RL) to enhance LLMs' long-horizon reasoning with deep search. To encourage diversity and reduce redundancy, we design a redundancy penalty that discourages repeated similar queries. Experiments show that DeepDive-32B achieves a new open-source competitive result on BrowseComp, outperforming WebSailor, DeepSeek-R1-Browse, and Search-o1. We demonstrate that multi-turn RL training improves deep search ability and significantly contributes to the performance improvements across multiple benchmarks. We observe that DeepDive enables test-time scaling of tool calls and parallel sampling. All datasets, models, and code are publicly available at https://github.com/THUDM/DeepDive.

Rui Lu, Zhenyu Hou, Zihan Wang, Hanchen Zhang, Xiao Liu, Yujiang Li, Shi Feng, Jie Tang, Yuxiao Dong• 2025

Related benchmarks

Task	Dataset	Result
Deep Research	Browsecomp	Score14.8	47
Deep Research	BrowseComp-ZH (BC-zh) original (test)	Pass@129.7	45
Deep Research	Browsecomp	Pass@115.3	33
Deep search	BrowseComp-ZH (test)	Accuracy29.7	27
Deep search	BrowseComp (test)	Accuracy15.3	27
Deep search	Xbench-DeepSearch (test)	Accuracy51.8	26
Deep-search QA	BrowseComp (test)	Pass@115.3	24
Deep-search QA	Xbench-DeepSearch (test)	Pass@151.8	24
Deep Research	xBench-DS-2505	Score50.5	22
Multi-step navigation and information location	BrowseComp English	Score15.3	22

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord