LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent
About
Reinforcement Learning (RL) has emerged as a powerful training paradigm for LLM-based agents. However, scaling agentic RL for deep research remains constrained by two coupled challenges: hand-crafted synthetic data fails to elicit genuine real-world search capabilities, and real-world search dependency during RL training introduces instability and prohibitive cost, which limits the scalability of Agentic RL. LiteResearcher is a training framework that makes Agentic RL scalable: by constructing a lite virtual world that mirrors real-world search dynamics, we enable a continuously improving training recipe that empowers a tiny search agent to outperform large-scale open-source and commercial models (e.g., Tongyi DeepResearch and Claude-4.5 Sonnet). Specifically, on common benchmarks such as GAIA and Xbench, our LiteResearcher-4B achieves open-source state-of-the-art results of 71.3% and 78.0% respectively, demonstrating that scalable RL training is a key enabler for Deep Research Agents.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reasoning | HLE | Accuracy (HLE Reasoning)22 | 63 | |
| Complex Reasoning | GAIA text | Accuracy71.3 | 19 | |
| Agentic Search | Xbench DeepSearch 2505 | Accuracy78 | 18 | |
| Agentic Search | Browsecomp | Accuracy27.5 | 16 | |
| Complex Reasoning | FRAMES | Accuracy83.1 | 13 | |
| Agentic Search | WebWalker | Accuracy72.7 | 9 | |
| Complex Reasoning | SEAL 0 | Accuracy (Seal-0)41.8 | 8 |