Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-Agent Collaboration via Evolving Orchestration

About

Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving. While recent research explores multi-agent collaboration among LLMs, most approaches rely on static organizational structures that struggle to adapt as task complexity and agent numbers grow, resulting in coordination overhead and inefficiencies. To this end, we propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a centralized orchestrator ("puppeteer") dynamically directs agents ("puppets") in response to evolving task states. This orchestrator is trained via reinforcement learning to adaptively sequence and prioritize agents, enabling flexible and evolvable collective reasoning. Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs. Analyses further reveal that the key improvements consistently stem from the emergence of more compact, cyclic reasoning structures under the orchestrator's evolution. Our code is available at https://github.com/OpenBMB/ChatDev/tree/puppeteer.

Yufan Dang, Chen Qian, Xueheng Luo, Jingru Fan, Zihao Xie, Ruijie Shi, Weize Chen, Cheng Yang, Xiaoyin Che, Ye Tian, Xuantang Xiong, Lei Han, Zhiyuan Liu, Maosong Sun• 2025

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval
Pass@175.3
1043
Mathematical ReasoningAMC
Accuracy (%)93.98
368
Mathematical ReasoningAIME 24
Accuracy75.83
318
General ReasoningMMLU
MMLU Accuracy84.3
180
Mathematical ReasoningAQUA
Accuracy77.5
167
Mathematical ReasoningHMMT25
Accuracy (%)57.08
115
Mathematical Problem SolvingAIME 2024
Accuracy80
113
Math Word Problem SolvingGSM8K
Accuracy93.3
111
General Knowledge ReasoningMMLU-Pro
Accuracy80.54
64
Code GenerationLCB v6
Accuracy41.06
49
Showing 10 of 29 rows

Other info

Follow for update