Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph

About

While achieving remarkable progress in a broad range of tasks, large language models (LLMs) remain significantly limited in properly using massive external tools. Existing in-context learning approaches simply format tools into a list of plain text descriptions and input them to LLMs, from which, LLMs generate a sequence of tool calls to solve problems step by step. Such a paradigm ignores the intrinsic dependency between tools and offloads all reasoning loads to LLMs, making them restricted to a limited number of specifically designed tools. It thus remains challenging for LLMs to operate on a library of massive tools, casting a great limitation when confronted with real-world scenarios. This paper proposes ToolNet, a plug-and-play framework that scales up the number of tools to thousands with a moderate increase in token consumption. ToolNet organizes tools into a directed graph. Each node represents a tool, and weighted edges denote tool transition. Starting from an initial tool node, an LLM navigates in the graph by iteratively choosing the next one from its successors until the task is resolved. Extensive experiments show that ToolNet can achieve impressive results in challenging multi-hop tool learning datasets and is resilient to tool failures.

Xukun Liu, Zhiyuan Peng, Xiaoyuan Yi, Xing Xie, Lirong Xiang, Yuchen Liu, Dongkuan Xu• 2024

Related benchmarks

TaskDatasetResultRank
Function SelectionTinyAgent
Function Selection Accuracy85.9
50
Function SelectionTaskBench HuggingFace
Function Selection Accuracy70
45
Function SelectionTaskBench Multimedia
Function Selection Acc73.8
36
Function SelectionTaskBench DailyLife
Function Selection Accuracy84.1
36
Agent Task Completionτ2-BENCH (test)
Average Task Reward0.452
27
Agent Task CompletionToolSandbox (test)
Avg Task Reward0.652
27
Agent Task Completionτ-Bench (test)
Average Task Reward0.652
27
Function CallingTinyAgent
FSA0.627
18
Function CallingTB-HF
FSA52.8
18
Function CallingTB-MM
FSA51.8
18
Showing 10 of 17 rows

Other info

Follow for update