OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents

About

Deep research agents have shown remarkable potential in handling long-horizon tasks. However, state-of-the-art performance typically relies on online reinforcement learning (RL), which is financially expensive due to extensive API calls. While offline training offers a more efficient alternative, its progress is hindered by the scarcity of high-quality research trajectories. In this paper, we demonstrate that expensive online reinforcement learning is not all you need to build powerful research agents. To bridge this gap, we introduce a fully open-source suite designed for effective offline training. Our core contributions include DeepForge, a ready-to-use task synthesis framework that generates large-scale research queries without heavy preprocessing; and a curated collection of 66k QA pairs, 33k SFT trajectories, and 21k DPO pairs. Leveraging these resources, we train OffSeeker (8B), a model developed entirely offline. Extensive evaluations across six benchmarks show that OffSeeker not only leads among similar-sized agents but also remains competitive with 30B-parameter systems trained via heavy online RL.

Yuhang Zhou, Kai Zheng, Qiguang Chen, Mengkang Hu, Qingfeng Sun, Can Xu, Jingjing Chen• 2026

Related benchmarks

Task	Dataset	Result
Deep Research	Browsecomp	Score12.8	47
Deep Research	BrowseComp-ZH (BC-zh) original (test)	Pass@126.6	45
Deep Research	xBench-DS-2505	Score49	22
Deep Research	BrowseComp-EN (BC-en) original (test)	Pass@112.8	20
Deep Research	GAIA text-only original (test)	Pass@151.5	20
Deep Research	GAIA Text-Only	Score51.5	17
Deep Research	XBench-DeepSearch original (test)	Pass@149	15
Deep Research	WebWalkerQA original (test)	Pass@161.7	14
Deep Research	HLE text-only original (test)	Pass@113.8	13

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord