MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

About

Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, primarily due to the misalignment between local agent objectives and holistic system goals. To address this, we introduce MASPO, a novel framework designed to automatically and iteratively refine prompts across the entire system. A core innovation of MASPO is its joint evaluation mechanism, which assesses prompts not merely by their local validity, but by their capacity to facilitate downstream success for successor agents. This effectively bridges the gap between local interactions and global outcomes without relying on ground-truth labels. Furthermore, MASPO employs a data-driven evolutionary beam search to efficiently navigate the high-dimensional prompt space. Extensive empirical evaluations across 6 diverse tasks demonstrate that MASPO consistently outperforms state-of-the-art prompt optimization methods, achieving an average accuracy improvement of 2.9. We release our code at https://github.com/wangzx1219/MASPO.

Zhexuan Wang, Xuebo Liu, Li Wang, Zifei Shan, Yutong Wang, Zhenxi Song, Min Zhang• 2026

Related benchmarks

Task	Dataset	Result
Math Reasoning	AQUA	Accuracy87.01	194
Code Generation	HumanEval-ET	--	108
Reasoning	GPQA Diamond	Accuracy58.08	36
Mathematical Proficiency	MATH 500	Accuracy (MATH 500)78.4	13
Mathematical Proficiency	AGIEval MATH Level-5	Accuracy64.45	13

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord