Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents

About

Deep research agents have shown remarkable potential in handling long-horizon tasks. However, state-of-the-art performance typically relies on online reinforcement learning (RL), which is financially expensive due to extensive API calls. While offline training offers a more efficient alternative, its progress is hindered by the scarcity of high-quality research trajectories. In this paper, we demonstrate that expensive online reinforcement learning is not all you need to build powerful research agents. To bridge this gap, we introduce a fully open-source suite designed for effective offline training. Our core contributions include DeepForge, a ready-to-use task synthesis framework that generates large-scale research queries without heavy preprocessing; and a curated collection of 66k QA pairs, 33k SFT trajectories, and 21k DPO pairs. Leveraging these resources, we train OffSeeker (8B), a model developed entirely offline. Extensive evaluations across six benchmarks show that OffSeeker not only leads among similar-sized agents but also remains competitive with 30B-parameter systems trained via heavy online RL.

Yuhang Zhou, Kai Zheng, Qiguang Chen, Mengkang Hu, Qingfeng Sun, Can Xu, Jingjing Chen• 2026

Related benchmarks

TaskDatasetResultRank
Deep ResearchBrowseComp-ZH (BC-zh) original (test)
Pass@126.6
45
Deep ResearchBrowseComp-EN (BC-en) original (test)
Pass@112.8
20
Deep ResearchGAIA text-only original (test)
Pass@151.5
20
Deep ResearchXBench-DeepSearch original (test)
Pass@149
15
Deep ResearchWebWalkerQA original (test)
Pass@161.7
14
Deep ResearchHLE text-only original (test)
Pass@113.8
13
Showing 6 of 6 rows

Other info

Follow for update