Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems

About

LLM-based multi-agent systems (MAS) have shown significant potential in tackling diverse tasks. However, to design effective MAS, existing approaches heavily rely on manual configurations or multiple calls of advanced LLMs, resulting in inadaptability and high inference costs. In this paper, we simplify the process of building an MAS by reframing it as a generative language task, where the input is a user query and the output is a corresponding MAS. To address this novel task, we unify the representation of MAS as executable code and propose a consistency-oriented data construction pipeline to create a high-quality dataset comprising coherent and consistent query-MAS pairs. Using this dataset, we train MAS-GPT, an open-source medium-sized LLM that is capable of generating query-adaptive MAS within a single LLM inference. The generated MAS can be seamlessly applied to process user queries and deliver high-quality responses. Extensive experiments on 9 benchmarks and 5 LLMs show that the proposed MAS-GPT consistently outperforms 10+ baseline MAS methods on diverse settings, indicating MAS-GPT's high effectiveness, efficiency and strong generalization ability. Code will be available at https://github.com/rui-ye/MAS-GPT.

Rui Ye, Shuo Tang, Rui Ge, Yaxin Du, Zhenfei Yin, Siheng Chen, Jing Shao• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH
Accuracy68.7
643
Mathematical ReasoningGSM8K
Accuracy (GSM8K)93.4
358
Question AnsweringGPQA
Accuracy37.6
258
Code GenerationHumanEval+--
189
Out-of-Domain ReasoningGPQA
Avg@8 Accuracy63.51
9
Mathematical ReasoningAIME 24
Avg@8 Accuracy58.75
9
Multi-Agent ReasoningAIME 24
Calls228
9
Multi-Agent ReasoningGPQA
Calls1.52e+3
9
Mathematical ReasoningAIME 25
AIME 25 Avg@8 Accuracy43.33
9
Showing 9 of 9 rows

Other info

Follow for update