Small Model as Master Orchestrator: Learning Unified Agent-Tool Orchestration with Parallel Subtask Decomposition

About

Multi-agent systems (MAS) demonstrate clear advantages in tackling complex problems by coordinating diverse agents and external tools. However, most existing orchestration methods rely on static workflows or serial agent scheduling, and are further constrained by heterogeneous interface protocols between tools and agents. This leads to high system complexity and poor extensibility. To mitigate these issues, we propose Agent-as-Tool, a unified parallel orchestration paradigm that abstracts both agents and tools into a standardized, learnable action space with protocol normalization and explicit state feedback. Building on this paradigm, we train a lightweight orchestrator, ParaManager, which decouples planning decisions from subtask solving, enabling state-aware parallel subtask decomposition, delegation, and asynchronous execution. For training, we adopt a two-stage ParaManager training pipeline. It improves robustness by incorporating supervised fine-tuning (SFT) trajectories equipped with recovery mechanisms, and further applies reinforcement learning (RL) to achieve an optimal balance among task success, protocol compliance, diversity, and reasoning efficiency. Experiments show that ParaManager achieves strong performance across multiple benchmarks and exhibits robust generalization under unseen model pools.

Wenzhen Yuan, Wutao Xiong, Fanchen Yu, Shengji Tang, Ting Liu, Tao Chen, Peng Ye, Yuzhuo Fu, Wanli Ouyang, Lei Bai• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	AMC	Accuracy (%)95.18	375
Mathematical Reasoning	AIME 24	Accuracy86.67	358
Mathematical Reasoning	HMMT25	Accuracy (%)63.33	115
Mathematical Problem Solving	AIME 2024	Accuracy86.67	113
General Knowledge Reasoning	MMLU-Pro	Accuracy81.43	64
Code Generation	LCB v6	Accuracy42.5	54
Mathematical Problem Solving	AIME 2025	Top-1 Accuracy (%)90	46
Code Generation	LCB v5	Accuracy39.45	45
Mathematical Reasoning	AIME25	Accuracy84.17	25
Scientific Reasoning	GPQA	Accuracy (%)72.1	25

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord