Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent

About

Reinforcement Learning (RL) has emerged as a powerful training paradigm for LLM-based agents. However, scaling agentic RL for deep research remains constrained by two coupled challenges: hand-crafted synthetic data fails to elicit genuine real-world search capabilities, and real-world search dependency during RL training introduces instability and prohibitive cost, which limits the scalability of Agentic RL. LiteResearcher is a training framework that makes Agentic RL scalable: by constructing a lite virtual world that mirrors real-world search dynamics, we enable a continuously improving training recipe that empowers a tiny search agent to outperform large-scale open-source and commercial models (e.g., Tongyi DeepResearch and Claude-4.5 Sonnet). Specifically, on common benchmarks such as GAIA and Xbench, our LiteResearcher-4B achieves open-source state-of-the-art results of 71.3% and 78.0% respectively, demonstrating that scalable RL training is a key enabler for Deep Research Agents.

Wanli Li, Bince Qu, Bo Pan, Jianyu Zhang, Zheng Liu, Pan Zhang, Wei Chen, Bo Zhang• 2026

Related benchmarks

TaskDatasetResultRank
ReasoningHLE
Accuracy (HLE Reasoning)22
63
Complex ReasoningGAIA text
Accuracy71.3
19
Agentic SearchXbench DeepSearch 2505
Accuracy78
18
Agentic SearchBrowsecomp
Accuracy27.5
16
Complex ReasoningFRAMES
Accuracy83.1
13
Agentic SearchWebWalker
Accuracy72.7
9
Complex ReasoningSEAL 0
Accuracy (Seal-0)41.8
8
Showing 7 of 7 rows

Other info

Follow for update