ToolACE: Winning the Points of LLM Function Calling

About

Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. ToolACE leverages a novel self-evolution synthesis process to curate a comprehensive API pool of 26,507 diverse APIs. Dialogs are further generated through the interplay among multiple agents, guided by a formalized thinking process. To ensure data accuracy, we implement a dual-layer verification system combining rule-based and model-based checks. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard, rivaling the latest GPT-4 models. Our model and a subset of the data are publicly available at https://huggingface.co/Team-ACE.

Weiwen Liu, Xu Huang, Xingshan Zeng, Xinlong Hao, Shuai Yu, Dexun Li, Shuai Wang, Weinan Gan, Zhengying Liu, Yuanqing Yu, Zezhong Wang, Yuxian Wang, Wu Ning, Yutai Hou, Bin Wang, Chuhan Wu, Xinzhi Wang, Yong Liu, Yasheng Wang, Duyu Tang, Dandan Tu, Lifeng Shang, Xin Jiang, Ruiming Tang, Defu Lian, Qun Liu, Enhong Chen• 2024

Related benchmarks

Task	Dataset	Result
Function Calling	BFCL V3	--	104
Interactive Tool-Use Agent Performance	tau2-Bench	Retail Performance Score38.7	102
Agentic Tool-use	tau2-Bench	Retail Score0.00e+0	59
Agent Performance	Tau-Bench	Retail Accuracy37.4	55
Multi-turn tool-use	BFCL Multi-Turn v3	Average Success Rate38.5	48
Agent Performance	ACEBench Agent	Agent Score52	36
Agentic Capability Evaluation	ACEBench-en	Normal Score28.3	34
Function Calling	BFCL Live	Simple Accuracy82.95	24
Tool Use	BFCL Multi-turn	Accuracy37	24
Function Calling	API-Bank	Level-1 Score75.94	20

Showing 10 of 26 rows

Other info

Follow for update

@wizwand_team Discord