MagicAgent: Towards Generalized Agent Planning
About
The evolution of Large Language Models (LLMs) from passive text processors to autonomous agents has established planning as a core component of modern intelligence. However, achieving generalized planning remains elusive, not only by the scarcity of high-quality interaction data but also by inherent conflicts across heterogeneous planning tasks. These challenges result in models that excel at isolated tasks yet struggle to generalize, while existing multi-task training attempts suffer from gradient interference. In this paper, we present \textbf{MagicAgent}, a series of foundation models specifically designed for generalized agent planning. We introduce a lightweight and scalable synthetic data framework that generates high-quality trajectories across diverse planning tasks, including hierarchical task decomposition, tool-augmented planning, multi-constraint scheduling, procedural logic orchestration, and long-horizon tool execution. To mitigate training conflicts, we propose a two-stage training paradigm comprising supervised fine-tuning followed by multi-objective reinforcement learning over both static datasets and dynamic environments. Empirical results demonstrate that MagicAgent-32B and MagicAgent-30B-A3B deliver superior performance, achieving accuracies of $75.1\%$ on Worfbench, $55.9\%$ on NaturalPlan, $57.5\%$ on $\tau^2$-Bench, $86.9\%$ on BFCL-v3, and $81.2\%$ on ACEBench, as well as strong results on our in-house MagicEval benchmarks. These results substantially outperform existing sub-100B models and even surpass leading closed-source models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Cross-Lingual Planning | ACEBench | Score (En)78.3 | 14 | |
| Hierarchical Task Decomposition | MagicEval-Plan Condition 3 | Step Count97.5 | 14 | |
| Hierarchical Task Decomposition | MagicEval-Plan Context Inheritance 3 | Step Score97.6 | 14 | |
| Multi-Constraint Scheduling | NaturalPlan | Trip Success Rate48.6 | 14 | |
| Tool-Augmented Planning | BFCL V3 | Live Success Rate84.1 | 14 | |
| Tool-Augmented Planning | MagicEval-Tool General | Name Accuracy97.7 | 14 | |
| Tool-Augmented Planning | MagicEval-Tool Dependency | Name Acc98.5 | 14 | |
| Tool-Augmented Planning | MagicEval-Tool Condition | Name Accuracy95.4 | 14 | |
| Tool-Augmented Planning | MagicEval-Tool Context Inheritance | Name Accuracy98.8 | 14 | |
| Workflow Planning | WorfBench | F1 Chain80.3 | 14 |