Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Stable Toolbench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool-callingStable Toolbench I3-Inst
Pass Rate68.85
16
Tool-callingStable Toolbench I2-Cat
Pass Rate78.3
16
Tool-callingStable Toolbench I2-Inst
Pass Rate77.42
16
Tool-callingStable Toolbench I1-Tool
Pass Rate75.95
16
Tool-callingStable Toolbench I1-Cat
Pass Rate76.07
16
Tool-callingStable Toolbench I1-Inst
Pass Rate0.7909
16
Showing 6 of 6 rows