From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions

About

Tool learning enables Large Language Models (LLMs) to interact with external environments by invoking tools, serving as an effective strategy to mitigate the limitations inherent in their pre-training data. In this process, tool documentation plays a crucial role by providing usage instructions for LLMs, thereby facilitating effective tool utilization. This paper concentrates on the critical challenge of bridging the comprehension gap between LLMs and external tools due to the inadequacies and inaccuracies inherent in existing human-centric tool documentation. We propose a novel framework, DRAFT, aimed at Dynamically Refining tool documentation through the Analysis of Feedback and Trials emanating from LLMs' interactions with external tools. This methodology pivots on an innovative trial-and-error approach, consisting of three distinct learning phases: experience gathering, learning from experience, and documentation rewriting, to iteratively enhance the tool documentation. This process is further optimized by implementing a diversity-promoting exploration strategy to ensure explorative diversity and a tool-adaptive termination mechanism to prevent overfitting while enhancing efficiency. Extensive experiments on multiple datasets demonstrate that DRAFT's iterative, feedback-based refinement significantly ameliorates documentation quality, fostering a deeper comprehension and more effective utilization of tools by LLMs. Notably, our analysis reveals that the tool documentation refined via our approach demonstrates robust cross-model generalization capabilities.

Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, Ji-Rong Wen• 2024

Related benchmarks

Task	Dataset	Result
Tool Use	ToolBench	Average Pass Rate56.43	53
Tool Learning	RestBench TMDB	Success Rate84.3	50
LLM Agent Evaluation	Tau-bench retail	Pass@148.4	38
Tool Use	StableToolBench	I2 Category Success68.5	28
Sequential Tool Use	RestBench Spotify	Success Rate85.2	22
Tool-use API Generalization	ToolBench G1 v1	Pass Rate82.1	22
Tool-use API Generalization	ToolBench G2	Pass Rate76.8	22
Tool-use API Generalization	ToolBench (G3)	Pass Rate68.4	22
Function Calling	BFCL Single-Turn	Accuracy77.8	22
Stateful Agent-User Interaction	Tau-bench airline	Pass@129.2	22

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord