Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents

About

Deep research agents have shown remarkable potential in handling long-horizon tasks. However, state-of-the-art performance typically relies on online reinforcement learning (RL), which is financially expensive due to extensive API calls. While offline training offers a more efficient alternative, its progress is hindered by the scarcity of high-quality research trajectories. In this paper, we demonstrate that expensive online reinforcement learning is not all you need to build powerful research agents. To bridge this gap, we introduce a fully open-source suite designed for effective offline training. Our core contributions include DeepForge, a ready-to-use task synthesis framework that generates large-scale research queries without heavy preprocessing; and a curated collection of 66k QA pairs, 33k SFT trajectories, and 21k DPO pairs. Leveraging these resources, we train OffSeeker (8B), a model developed entirely offline. Extensive evaluations across six benchmarks show that OffSeeker not only leads among similar-sized agents but also remains competitive with 30B-parameter systems trained via heavy online RL.

Yuhang Zhou, Kai Zheng, Qiguang Chen, Mengkang Hu, Qingfeng Sun, Can Xu, Jingjing Chen• 2026

Related benchmarks

TaskDatasetResultRank
Deep ResearchBrowsecomp
Score12.8
47
Deep ResearchBrowseComp-ZH (BC-zh) original (test)
Pass@126.6
45
Deep ResearchxBench-DS-2505
Score49
22
Deep ResearchBrowseComp-EN (BC-en) original (test)
Pass@112.8
20
Deep ResearchGAIA text-only original (test)
Pass@151.5
20
Deep ResearchGAIA Text-Only
Score51.5
17
Deep ResearchXBench-DeepSearch original (test)
Pass@149
15
Deep ResearchWebWalkerQA original (test)
Pass@161.7
14
Deep ResearchHLE text-only original (test)
Pass@113.8
13
Showing 9 of 9 rows

Other info

Follow for update