Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ToolBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool-useToolBench
Average Success Rate (ASR)99.61
62
Tool-useToolBench
Average Pass Rate80.67
53
Tool RetrievalToolBench
NDCG@1058.54
44
Function CallingToolBench Average
Pass Rate75.95
30
Tool RetrievalToolBench In-domain I1
NDCG@193.76
29
Tool-use ReasoningToolBench
API Success Rate88
28
Tool useToolBench (test)
Pass@183.7
28
Tool PlanningToolBench
EM (%)42.6
24
Invocation attackToolBench
CDA98
24
Tool ReasoningToolBench (G3)
Pass Rate91.8
24
Tool ReasoningToolBench G2
Pass Rate93
24
Tool ReasoningToolBench (G1)
Pass Rate85.5
24
Tool-use API GeneralizationToolBench G3
Pass Rate71.5
22
Tool-use API GeneralizationToolBench G2
Pass Rate78.2
22
Tool-use API GeneralizationToolBench G1 v1
Pass Rate83.5
22
Tool RetrievalToolBench In-domain (I3)
NDCG@191.74
20
Tool RetrievalToolBench In-domain (I2)
NDCG@191.91
20
Tool UseToolBench
Energy (Wh)5.6
18
Throughput EfficiencyToolBench
Throughput (tokens/s)4,602
18
LLM InferenceToolBench
Goodput (req/s)3.9
18
Agent TaskToolBench
Success Rate44.98
16
End-to-end Tool-useToolBench I1 v1
SoPR56.13
16
Function CallingToolBench I3-Inst
Pass Rate52.4
14
Function CallingToolBench I2-Inst
Pass Rate71.4
14
Function CallingToolBench I1-Inst
Pass Rate57.1
14
Showing 25 of 90 rows