ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph

About

While achieving remarkable progress in a broad range of tasks, large language models (LLMs) remain significantly limited in properly using massive external tools. Existing in-context learning approaches simply format tools into a list of plain text descriptions and input them to LLMs, from which, LLMs generate a sequence of tool calls to solve problems step by step. Such a paradigm ignores the intrinsic dependency between tools and offloads all reasoning loads to LLMs, making them restricted to a limited number of specifically designed tools. It thus remains challenging for LLMs to operate on a library of massive tools, casting a great limitation when confronted with real-world scenarios. This paper proposes ToolNet, a plug-and-play framework that scales up the number of tools to thousands with a moderate increase in token consumption. ToolNet organizes tools into a directed graph. Each node represents a tool, and weighted edges denote tool transition. Starting from an initial tool node, an LLM navigates in the graph by iteratively choosing the next one from its successors until the task is resolved. Extensive experiments show that ToolNet can achieve impressive results in challenging multi-hop tool learning datasets and is resilient to tool failures.

Xukun Liu, Zhiyuan Peng, Xiaoyuan Yi, Xing Xie, Lirong Xiang, Yuchen Liu, Dongkuan Xu• 2024

Related benchmarks

Task	Dataset	Result
Function Selection	TinyAgent	Function Selection Accuracy85.9	50
Function Selection	TaskBench HuggingFace	Function Selection Accuracy70	45
Function Selection	TaskBench Multimedia	Function Selection Acc73.8	36
Function Selection	TaskBench DailyLife	Function Selection Accuracy84.1	36
Multi-turn agent task	ACEBench multi-turn (test)	Process Accuracy66.4	31
Agent Task Completion	τ2-BENCH (test)	Average Task Reward0.452	27
Agent Task Completion	ToolSandbox (test)	Avg Task Reward0.652	27
Agent Task Completion	τ-Bench (test)	Average Task Reward0.652	27
Function Calling	TinyAgent	FSA0.627	18
Function Calling	TB-HF	FSA52.8	18

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord