Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Latent Collaboration in Multi-Agent Systems

About

Multi-agent systems (MAS) extend large language models (LLMs) from independent single-model reasoning to coordinative system-level intelligence. While existing LLM agents depend on text-based mediation for reasoning and communication, we take a step forward by enabling models to collaborate directly within the continuous latent space. We introduce LatentMAS, an end-to-end training-free framework that enables pure latent collaboration among LLM agents. In LatentMAS, each agent first performs auto-regressive latent thoughts generation through last-layer hidden embeddings. A shared latent working memory then preserves and transfers each agent's internal representations, ensuring lossless information exchange. We provide theoretical analyses establishing that LatentMAS attains higher expressiveness and lossless information preservation with substantially lower complexity than vanilla text-based MAS. In addition, empirical evaluations across 9 comprehensive benchmarks spanning math and science reasoning, commonsense understanding, and code generation show that LatentMAS consistently outperforms strong single-model and text-based MAS baselines, achieving up to 14.6% higher accuracy, reducing output token usage by 70.8%-83.7%, and providing 4x-4.3x faster end-to-end inference. These results demonstrate that our new latent collaboration framework enhances system-level reasoning quality while offering substantial efficiency gains without any additional training. Code and data are fully open-sourced at https://github.com/Gen-Verse/LatentMAS.

Jiaru Zou, Xiyuan Yang, Ruizhong Qiu, Gaotang Li, Katherine Tieu, Pan Lu, Ke Shen, Hanghang Tong, Yejin Choi, Jingrui He, James Zou, Mengdi Wang, Ling Yang• 2025

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval+--
189
Mathematical Problem SolvingMATH
Accuracy78.6
166
Medical Question AnsweringMedQA
Accuracy81.2
109
Math Word Problem SolvingGSM8K
Accuracy95.2
91
Code GenerationMBPP+
Accuracy75.7
75
Question AnsweringGPQA Diamond
Accuracy63.6
62
Mathematical Problem SolvingAIME 25
Accuracy63.3
54
Multi-task EvaluationAggregate (AIME25, AIME24, MATH, GSM8K, HumanEval+, MBPP+, MedQA, GPQA-Diamond)
Average Accuracy75.9
21
Math problem solvingAIME 24
Accuracy73.3
21
MathAIME24
Accuracy66.7
20
Showing 10 of 13 rows

Other info

GitHub

Follow for update