Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

StableToolBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool UseStableToolBench
I2 Category Success72.8
28
Next-state predictionStableToolBench (STB)
EM Accuracy49.25
16
Tool UseStableToolBench cost-augmented
PR76
14
Agent Tool UseStableToolBench Held-In
Pass Rate50.4
14
Tool LearningStableToolBench Average
SoPR70.3
13
Tool LearningStableToolBench I3-Inst.
SoPR76
13
Tool LearningStableToolBench I2-Cat.
SoPR71.9
13
Tool LearningStableToolBench I2-Inst.
SoPR73.4
13
Tool LearningStableToolBench I1-Cat.
SoPR70.9
13
Tool LearningStableToolBench I1-Tool
SoPR73.9
13
Tool LearningStableToolBench I1-Inst.
SoPR69
13
Tool UseStableToolBench G1 Category
SL76.8
12
Tool orchestrationStableToolBench 1.0 (test)
I1 Instruction Success Rate50.3
10
API Execution SimulationStableToolBench
ID High Success Rate16.47
8
Tool UseStableToolBench Overall Average
SL (Success Rate)70.3
6
Tool UseStableToolBench G3 Instruction
SL Score66.3
6
Tool UseStableToolBench G2 Instruction
SL Score68.8
6
Tool UseStableToolBench G2 Category
SL71
6
Tool UseStableToolBench G1 Instruction
SL Score75.5
6
Tool callingStableToolBench (STB) I3-Inst
Solvable Pass Rate48.3
6
Tool UseStableToolBench v1 (test)
G1 Category SL75.5
5
Tool UseStableToolBench trace-free (test)
F1 Score (Impr Pts)6.8
4
Showing 22 of 22 rows