WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

About

Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where a single agent solves long-horizon problems with multi-turn reasoning and tool use. However, as tasks grow broader, the key bottleneck shifts from individual competence to organizational capability. In this work, we explore a complementary dimension of width scaling with multi-agent systems to address broad information seeking. Existing multi-agent systems often rely on hand-crafted workflows and turn-taking interactions that fail to parallelize work effectively. To bridge this gap, we propose WideSeek-R1, a lead-agent-subagent framework trained via multi-agent reinforcement learning (MARL) to synergize scalable orchestration and parallel execution. By utilizing a shared LLM with isolated contexts and specialized tools, WideSeek-R1 jointly optimizes the lead agent and parallel subagents on a curated dataset of 20k broad information-seeking tasks. Extensive experiments show that WideSeek-R1-4B achieves an item F1 score of 40.0% on the WideSearch benchmark, which is comparable to the performance of single-agent DeepSeek-R1-671B. Furthermore, WideSeek-R1-4B exhibits consistent performance gains as the number of parallel subagents increases, highlighting the effectiveness of width scaling.

Zelai Xu, Zhexuan Xu, Ruize Zhang, Chunyang Zhu, Shi Yu, Weilin Liu, Quanlu Zhang, Wenbo Ding, Chao Yu, Yu Wang• 2026

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	2WikiMultihopQA	--	559
Single-hop Question Answering	PopQA	--	186
Single-hop Question Answering	TriviaQA	--	133
Broad Information Seeking	WideSearch	Item F1 (Avg@4)40	34
Agentic Tool-use	Agentic Macro-aggregate	Pass@135.8	22
Reading Comprehension	Reading Macro-aggregate	Pass@151	22
Knowledge retrieval	Knowledge Macro-aggregate	Pass@158.1	22
Math problem solving	Math Macro-aggregate	Pass@141.2	22
Code and Software Engineering	Code/SE Macro-aggregate	Pass@146.7	22
Multi-hop Question Answering	HotpotQA	Avg@464.2	9

Showing 10 of 13 rows

Other info

GitHub

Follow for update

@wizwand_team Discord