TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments

About

Large Language Model (LLM) agents are rapidly emerging as powerful systems for automating tasks across domains. Yet progress in the open-source community is constrained by the lack of high quality permissively licensed tool-agentic training data. Existing datasets are often limited in diversity, realism, and complexity, particularly regarding multi-tool and multi-turn interactions. To address this gap, we introduce Toucan, the largest publicly available tool-agentic dataset to date, containing 1.5 million trajectories synthesized from nearly 500 real-world Model Context Protocols (MCPs). Unlike prior work, Toucan leverages authentic MCP environments to generate diverse, realistic, and challenging tasks with trajectories involving real tool execution. Our pipeline first produces a broad spectrum of tool-use queries using five distinct models, applies model-based quality filtering, and then generates agentic trajectories with three teacher models using two agentic frameworks. Rigorous rule-based and model-based validation ensures high-quality outputs. We also introduce three extension mechanisms to further diversify tasks and simulate multi-turn conversations. Models fine-tuned on Toucan outperform larger closed-source counterparts on the BFCL V3 benchmark and push the Pareto frontier forward on MCP-Universe Bench.

Zhangchen Xu, Adriana Meza Soria, Shawn Tan, Anurag Roy, Ashish Sunil Agrawal, Radha Poovendran, Rameswar Panda• 2025

Related benchmarks

Task	Dataset	Result
Agentic Tool-use	tau2-Bench	Retail Score22.8	59
Agentic Workflow Success	τ2-bench	Airline Success Rate22	43
Tool Use	MCPMark	Total Success Rate1	26
Tool Use	BFCL Multi-turn	Accuracy37.03	24
Function Calling	BFCL v4	Multi-Turn Success Rate17.8	20
Multi-Turn Tool Calling	τ2-bench	Airline Score20	19
Tool-augmented Reasoning	BFCL Multi-Turn v3	Overall Score22.6	14
Tool Use	Tau-Bench	TAU-AIR Score33.5	14
Tool	BFCL V3	Accuracy59.5	12
Tool Use	ACEBench	Accuracy48.8	8

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord