LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent

About

Reinforcement Learning (RL) has emerged as a powerful training paradigm for LLM-based agents. However, scaling agentic RL for deep research remains constrained by two coupled challenges: hand-crafted synthetic data fails to elicit genuine real-world search capabilities, and real-world search dependency during RL training introduces instability and prohibitive cost, which limits the scalability of Agentic RL. LiteResearcher is a training framework that makes Agentic RL scalable: by constructing a lite virtual world that mirrors real-world search dynamics, we enable a continuously improving training recipe that empowers a tiny search agent to outperform large-scale open-source and commercial models (e.g., Tongyi DeepResearch and Claude-4.5 Sonnet). Specifically, on common benchmarks such as GAIA and Xbench, our LiteResearcher-4B achieves open-source state-of-the-art results of 71.3% and 78.0% respectively, demonstrating that scalable RL training is a key enabler for Deep Research Agents.

Wanli Li, Bince Qu, Bo Pan, Jianyu Zhang, Zheng Liu, Pan Zhang, Wei Chen, Bo Zhang• 2026

Related benchmarks

Task	Dataset	Result
Reasoning	HLE	Accuracy (HLE Reasoning)22	63
Complex Reasoning	GAIA text	Accuracy71.3	19
Agentic Search	Xbench DeepSearch 2505	Accuracy78	18
Agentic Search	Browsecomp	Accuracy27.5	16
Complex Reasoning	FRAMES	Accuracy83.1	13
Agentic Search	WebWalker	Accuracy72.7	9
Complex Reasoning	SEAL 0	Accuracy (Seal-0)41.8	8
Deep research agent evaluation	GAIA	GAIA Score71.3	4

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord