Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ToolBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool RetrievalToolBench
NDCG@1058.54
44
Tool RetrievalToolBench In-domain I1
NDCG@193.76
29
Tool-useToolBench
Average Pass Rate71.3
29
Tool useToolBench (test)
Pass@183.7
28
Invocation attackToolBench
CDA98
24
Tool ReasoningToolBench (G3)
Pass Rate91.8
24
Tool ReasoningToolBench G2
Pass Rate93
24
Tool ReasoningToolBench (G1)
Pass Rate85.5
24
Tool-use API GeneralizationToolBench G3
Pass Rate71.5
22
Tool-use API GeneralizationToolBench G2
Pass Rate78.2
22
Tool-use API GeneralizationToolBench G1 v1
Pass Rate83.5
22
Tool RetrievalToolBench In-domain (I3)
NDCG@191.74
20
Tool RetrievalToolBench In-domain (I2)
NDCG@191.91
20
Tool UseToolBench
Energy (Wh)5.6
18
Throughput EfficiencyToolBench
Throughput (tokens/s)4,602
18
Tool-useToolBench
Average Token Length127
18
LLM InferenceToolBench
Goodput (req/s)3.9
18
End-to-end Tool-useToolBench I1 v1
SoPR56.13
16
Function CallingToolBench Average
Pass Rate60.3
14
Function CallingToolBench I3-Inst
Pass Rate52.4
14
Function CallingToolBench I2-Inst
Pass Rate71.4
14
Function CallingToolBench I1-Inst
Pass Rate57.1
14
Tool UseToolBench 50 APIs v1 (test)
Wellformedness99.2
14
Tool RetrievalToolBench I3 (test)
Recall@376.63
13
Tool RetrievalToolBench I2 (test)
Recall@375.72
13
Showing 25 of 70 rows