Multi-agent Architecture Search via Agentic Supernet
About
Large Language Model (LLM)-empowered multi-agent systems extend the cognitive boundaries of individual agents through disciplined collaboration and interaction, while constructing these systems often requires labor-intensive manual designs. Despite the availability of methods to automate the design of agentic workflows, they typically seek to identify a static, complex, one-size-fits-all system, which, however, fails to dynamically allocate inference resources based on the difficulty and domain of each query. To address this challenge, we shift away from the pursuit of a monolithic agentic system, instead optimizing the \textbf{agentic supernet}, a probabilistic and continuous distribution of agentic architectures. We introduce MaAS, an automated framework that samples query-dependent agentic systems from the supernet, delivering high-quality solutions and tailored resource allocation (\textit{e.g.}, LLM calls, tool calls, token cost). Comprehensive evaluation across six benchmarks demonstrates that MaAS \textbf{(I)} requires only $6\sim45\%$ of the inference costs of existing handcrafted or automated multi-agent systems, \textbf{(II)} surpasses them by $0.54\%\sim11.82\%$, and \textbf{(III)} enjoys superior cross-dataset and cross-LLM-backbone transferability.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K | Accuracy92.3 | 983 | |
| Code Generation | HumanEval | Pass@192.85 | 850 | |
| Mathematical Reasoning | GSM8K (test) | Accuracy96.4 | 797 | |
| Mathematical Reasoning | MATH | Accuracy51.23 | 535 | |
| Mathematical Reasoning | MATH (test) | Overall Accuracy47.1 | 433 | |
| Mathematical Reasoning | GSM8K | Accuracy92.3 | 351 | |
| Multi-hop Question Answering | HotpotQA (test) | -- | 198 | |
| Mathematical Reasoning | MATH | Accuracy52.25 | 162 | |
| Multi-hop Question Answering | 2WikiMultiHopQA (test) | EM23.1 | 143 | |
| Mathematical Reasoning | AQUA | Accuracy76.2 | 132 |