AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

About

While large language model (LLM) multi-agent systems achieve superior reasoning performance through iterative debate, practical deployment is limited by their high computational cost and error propagation. This paper proposes AgentArk, a novel framework to distill multi-agent dynamics into the weights of a single model, effectively transforming explicit test-time interactions into implicit model capabilities. This equips a single agent with the intelligence of multi-agent systems while remaining computationally efficient. Specifically, we investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios: reasoning-enhanced fine-tuning; trajectory-based augmentation; and process-aware distillation. By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents. They further demonstrate enhanced robustness and generalization across diverse reasoning tasks. We hope this work can shed light on future research on efficient and robust multi-agent development. Our code is at https://github.com/AIFrontierLab/AgentArk.

Yinyi Luo, Yiqiao Jin, Weichen Yu, Mengqi Zhang, Srijan Kumar, Xiaoxiao Li, Weijie Xu, Xin Chen, Jindong Wang• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K (test)	--	954
Mathematical Reasoning	MATH	--	882
Mathematical Reasoning	MATH	--	535
Medical Question Answering	MedMCQA	--	521
Mathematical Reasoning	GSM8K	--	358
Mathematical Reasoning	GSM8K	--	197
Medical Question Answering	MedMCQA (test)	--	134
Mathematical Reasoning	MetaMathQA	--	54
Mathematical Reasoning	MetaMathQA (test)	--	26
Mathematical Reasoning	MATH (test)	--	18

Showing 10 of 10 rows

Other info

GitHub

Follow for update

@wizwand_team Discord