MagicAgent: Towards Generalized Agent Planning

About

The evolution of Large Language Models (LLMs) from passive text processors to autonomous agents has established planning as a core component of modern intelligence. However, achieving generalized planning remains elusive, not only by the scarcity of high-quality interaction data but also by inherent conflicts across heterogeneous planning tasks. These challenges result in models that excel at isolated tasks yet struggle to generalize, while existing multi-task training attempts suffer from gradient interference. In this paper, we present \textbf{MagicAgent}, a series of foundation models specifically designed for generalized agent planning. We introduce a lightweight and scalable synthetic data framework that generates high-quality trajectories across diverse planning tasks, including hierarchical task decomposition, tool-augmented planning, multi-constraint scheduling, procedural logic orchestration, and long-horizon tool execution. To mitigate training conflicts, we propose a two-stage training paradigm comprising supervised fine-tuning followed by multi-objective reinforcement learning over both static datasets and dynamic environments. Empirical results show that MagicAgent-32B and MagicAgent-30B-A3B achieve superior performance across diverse open-source benchmarks (\emph{e.g.}, $75.1\%$ on Worfbench and $86.9\%$ on BFCL-v3), as well as strong results on our in-house MagicEval benchmarks, substantially outperforming existing sub-100B models and surpassing leading ultra-scale models, including GPT-5.2, Kimi-K2 and GLM-4.7.

Xuhui Ren, Shaokang Dong, Chen Yang, Qing Gao, Yunbin Zhao, Yongsheng Liu, Xinwei Geng, Xiang Li, Demei Yan, Yanqing Li, Chenhao Huang, Dingwei Zhu, Junjie Ye, Boxuan Yue, Yingnan Fu, Mengzhe Lv, Zezeng Feng, Boshen Zhou, Bocheng Wang, Xuanjing Huang, Yu-Gang Jiang, Tao Gui, Qi Zhang, Yunke Zhang• 2026

Related benchmarks

Task	Dataset	Result
Cross-Lingual Planning	ACEBench	Score (En)78.3	14
Hierarchical Task Decomposition	MagicEval-Plan Condition 3	Step Count97.5	14
Hierarchical Task Decomposition	MagicEval-Plan Context Inheritance 3	Step Score97.6	14
Multi-Constraint Scheduling	NaturalPlan	Trip Success Rate48.6	14
Tool-Augmented Planning	BFCL V3	Live Success Rate84.1	14
Tool-Augmented Planning	MagicEval-Tool General	Name Accuracy97.7	14
Tool-Augmented Planning	MagicEval-Tool Dependency	Name Acc98.5	14
Tool-Augmented Planning	MagicEval-Tool Condition	Name Accuracy95.4	14
Tool-Augmented Planning	MagicEval-Tool Context Inheritance	Name Accuracy98.8	14
Workflow Planning	WorfBench	F1 Chain80.3	14

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord