Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions

About

Tool learning enables Large Language Models (LLMs) to interact with external environments by invoking tools, serving as an effective strategy to mitigate the limitations inherent in their pre-training data. In this process, tool documentation plays a crucial role by providing usage instructions for LLMs, thereby facilitating effective tool utilization. This paper concentrates on the critical challenge of bridging the comprehension gap between LLMs and external tools due to the inadequacies and inaccuracies inherent in existing human-centric tool documentation. We propose a novel framework, DRAFT, aimed at Dynamically Refining tool documentation through the Analysis of Feedback and Trials emanating from LLMs' interactions with external tools. This methodology pivots on an innovative trial-and-error approach, consisting of three distinct learning phases: experience gathering, learning from experience, and documentation rewriting, to iteratively enhance the tool documentation. This process is further optimized by implementing a diversity-promoting exploration strategy to ensure explorative diversity and a tool-adaptive termination mechanism to prevent overfitting while enhancing efficiency. Extensive experiments on multiple datasets demonstrate that DRAFT's iterative, feedback-based refinement significantly ameliorates documentation quality, fostering a deeper comprehension and more effective utilization of tools by LLMs. Notably, our analysis reveals that the tool documentation refined via our approach demonstrates robust cross-model generalization capabilities.

Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, Ji-Rong Wen• 2024

Related benchmarks

TaskDatasetResultRank
Tool LearningRestBench TMDB
Success Rate84.3
32
Tool UseToolBench
Average Pass Rate56.43
29
Tool UseStableToolBench
I2 Category Success68.5
28
Sequential Tool UseRestBench Spotify
Success Rate85.2
22
Tool-use API GeneralizationToolBench G1 v1
Pass Rate82.1
22
Tool-use API GeneralizationToolBench G2
Pass Rate76.8
22
Tool-use API GeneralizationToolBench (G3)
Pass Rate68.4
22
Function CallingBFCL Single-Turn
Accuracy77.8
22
LLM Agent EvaluationTau-bench retail
Pass@148.4
22
Stateful Agent-User InteractionTau-bench airline
Pass@129.2
22
Showing 10 of 20 rows

Other info

Follow for update