Multi-agent Architecture Search via Agentic Supernet

About

Large Language Model (LLM)-empowered multi-agent systems extend the cognitive boundaries of individual agents through disciplined collaboration and interaction, while constructing these systems often requires labor-intensive manual designs. Despite the availability of methods to automate the design of agentic workflows, they typically seek to identify a static, complex, one-size-fits-all system, which, however, fails to dynamically allocate inference resources based on the difficulty and domain of each query. To address this challenge, we shift away from the pursuit of a monolithic agentic system, instead optimizing the \textbf{agentic supernet}, a probabilistic and continuous distribution of agentic architectures. We introduce MaAS, an automated framework that samples query-dependent agentic systems from the supernet, delivering high-quality solutions and tailored resource allocation (\textit{e.g.}, LLM calls, tool calls, token cost). Comprehensive evaluation across six benchmarks demonstrates that MaAS \textbf{(I)} requires only $6\sim45\%$ of the inference costs of existing handcrafted or automated multi-agent systems, \textbf{(II)} surpasses them by $0.54\%\sim11.82\%$, and \textbf{(III)} enjoys superior cross-dataset and cross-LLM-backbone transferability.

Guibin Zhang, Luyang Niu, Junfeng Fang, Kun Wang, Lei Bai, Xiang Wang• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy92.3	1398
Code Generation	HumanEval	Pass@192.85	1043
Mathematical Reasoning	GSM8K (test)	Accuracy96.4	954
Code Generation	HumanEval (test)	--	612
Mathematical Reasoning	MATH	Accuracy51.23	535
Mathematical Reasoning	GSM8K	Accuracy92.3	499
Multi-task Language Understanding	MMLU	MMLU Accuracy63.9	442
Mathematical Reasoning	MATH (test)	Overall Accuracy47.1	433
Code Generation	MBPP (test)	--	405
Mathematical Reasoning	MATH	Accuracy52.25	338

Showing 10 of 102 rows

...

Other info

Follow for update

@wizwand_team Discord