Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning to Rewrite Tool Descriptions for Reliable LLM-Agent Tool Use

About

The performance of LLM-based agents depends not only on the agent itself but also on the quality of the tool interfaces it consumes. While prior work has focused heavily on agent fine-tuning, tool interfaces-including natural language descriptions and parameter schemas-remain largely human-oriented and often become a bottleneck, especially when agents must select from large candidate tool sets. Existing approaches to improving tool interfaces rely on execution traces, which are frequently unavailable in cold-start or privacy-constrained settings, and typically optimize each tool independently, limiting scalability and generalization to unseen tools. We propose Trace-Free+, a curriculum learning framework that progressively transfers supervision from trace-rich settings to trace-free deployment, encouraging the model to abstract reusable interface-usage patterns and tool usage outcomes. To support this approach, we construct a large-scale dataset of high-quality tool interfaces using a structured workflow over a diverse collection of tools. Experiments on StableToolBench and RestBench show consistent gains on unseen tools, strong cross-domain generalization, and robustness as the number of candidate tools scales to over 100, demonstrating that tool interface optimization is a practical and deployable complement to agent fine-tuning.

Ruocheng Guo, Kaiwen Dong, Xiang Gao, Kamalika Das• 2026

Related benchmarks

TaskDatasetResultRank
Tool UseStableToolBench G1 Category
SL76.8
12
Tool UseStableToolBench G1 Instruction
SL Score75.5
6
Tool UseStableToolBench G2 Category
SL71
6
Tool UseStableToolBench G2 Instruction
SL Score68.8
6
Tool UseStableToolBench Overall Average
SL (Success Rate)70.3
6
Tool UseStableToolBench G3 Instruction
SL Score60.7
6
Tool UseStableToolBench v1 (test)
G1 Category SL73.8
5
Tool ExecutionTrace-based setting
Improvement (%)14.8
4
Tool selectionTrace-based setting
Improvement6.8
4
Tool UseStableToolBench trace-free (test)
F1 Score (Impr Pts)6.8
4
Showing 10 of 10 rows

Other info

Follow for update