O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL
About
The performance gap between closed-source and open-source large language models (LLMs) is largely attributed to disparities in access to high-quality training data. To bridge this gap, we introduce a novel framework for the automated synthesis of sophisticated, research-grade instructional data. Our approach centers on a multi-agent workflow where collaborative AI agents simulate complex tool-integrated reasoning to generate diverse and high-fidelity data end-to-end. Leveraging this synthesized data, we develop a two-stage training strategy that integrates supervised fine-tuning with a novel reinforcement learning method, designed to maximize model alignment and capability. Extensive experiments demonstrate that our framework empowers open-source models across multiple scales, enabling them to achieve new state-of-the-art performance on the major deep research benchmark. This work provides a scalable and effective pathway for advancing open-source LLMs without relying on proprietary data or models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Deep Research | DeepResearch Bench official 100-task-subset 1.0 | RACE Overall0.4848 | 24 | |
| Deep Research Report Generation | DeepResearchGym Commercial 100 | KPR77.28 | 9 |