Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ToolBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool RetrievalToolBench
NDCG@1058.54
44
Tool RetrievalToolBench In-domain I1
NDCG@193.76
29
Tool-useToolBench
Average Pass Rate71.3
29
Tool ReasoningToolBench (G3)
Pass Rate91.8
24
Tool ReasoningToolBench G2
Pass Rate93
24
Tool ReasoningToolBench (G1)
Pass Rate85.5
24
Tool RetrievalToolBench In-domain (I3)
NDCG@191.74
20
Tool RetrievalToolBench In-domain (I2)
NDCG@191.91
20
Tool UseToolBench
Energy (Wh)5.6
18
Throughput EfficiencyToolBench
Throughput (tokens/s)4,602
18
Tool-useToolBench
Average Token Length127
18
LLM InferenceToolBench
Goodput (req/s)3.9
18
End-to-end Tool-useToolBench I1 v1
SoPR56.13
16
Function CallingToolBench Average
Pass Rate60.3
14
Function CallingToolBench I3-Inst
Pass Rate52.4
14
Function CallingToolBench I2-Inst
Pass Rate71.4
14
Function CallingToolBench I1-Inst
Pass Rate57.1
14
Tool UseToolBench 50 APIs v1 (test)
Wellformedness99.2
14
Tool RetrievalToolBench I3 (test)
Recall@376.63
13
Tool RetrievalToolBench I2 (test)
Recall@375.72
13
Tool planningToolBench G1 set
Win Rate (G1-Instruction)88.192
13
Tool-use PlanningToolBench Average over all sets
Win Rate86.54
13
Tool-use PlanningToolBench G3-Instruction
Win Rate0.9368
13
Tool-use PlanningToolBench G2-Category
Win Rate78.78
13
Tool-use PlanningToolBench G2-Instruction
Win Rate87.59
13
Showing 25 of 46 rows