StableToolBench

Benchmarks

Task Name	Dataset Name	SOTA Result
Tool Use	StableToolBench	I2 Category Success72.8	28
Next-state prediction	StableToolBench (STB)	EM Accuracy49.25	16
Tool Use	StableToolBench cost-augmented	PR76	14
Agent Tool Use	StableToolBench Held-In	Pass Rate50.4	14
Tool Learning	StableToolBench Average	SoPR70.3	13
Tool Learning	StableToolBench I3-Inst.	SoPR76	13
Tool Learning	StableToolBench I2-Cat.	SoPR71.9	13
Tool Learning	StableToolBench I2-Inst.	SoPR73.4	13
Tool Learning	StableToolBench I1-Cat.	SoPR70.9	13
Tool Learning	StableToolBench I1-Tool	SoPR73.9	13
Tool Learning	StableToolBench I1-Inst.	SoPR69	13
Tool Use	StableToolBench G1 Category	SL76.8	12
Tool orchestration	StableToolBench 1.0 (test)	I1 Instruction Success Rate50.3	10
API Execution Simulation	StableToolBench	ID High Success Rate16.47	8
Tool Use	StableToolBench Overall Average	SL (Success Rate)70.3	6
Tool Use	StableToolBench G3 Instruction	SL Score66.3	6
Tool Use	StableToolBench G2 Instruction	SL Score68.8	6
Tool Use	StableToolBench G2 Category	SL71	6
Tool Use	StableToolBench G1 Instruction	SL Score75.5	6
Tool calling	StableToolBench (STB) I3-Inst	Solvable Pass Rate48.3	6
Tool Use	StableToolBench v1 (test)	G1 Category SL75.5	5
Tool Use	StableToolBench trace-free (test)	F1 Score (Impr Pts)6.8	4

Showing 22 of 22 rows