Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WHEN2TOOL

Benchmarks

Task NameDataset NameSOTA ResultTrend
Single-hop Tool CallingWHEN2TOOL single-hop 1.0 (test)
Accuracy94.3
90
Tool CallingWHEN2TOOL Overall
Δ Accuracy-1
7
Tool CallingWHEN2TOOL Hard
Delta Accuracy (ΔAcc)-0.8
7
Tool CallingWHEN2TOOL Medium
Delta Accuracy-0.7
7
Tool CallingWHEN2TOOL Easy
ΔAcc-0.3
7
Showing 5 of 5 rows