Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool Use on StableToolBench Overall Average
Loading...
70.3
SL (Success Rate)
Trace-Based
66.348
67.374
68.4
69.426
Feb 23, 2026
SL (Success Rate)
QL (Quality Score)
Updated 4d ago
Evaluation Results
Method
Method
Links
SL (Success Rate)
QL (Quality Score)
Trace-Based
Evaluation Protocol=Tr...
2026.02
70.3
54.6
Play2Prompt
Evaluation Protocol=Tr...
2026.02
69.8
52.5
D2
Evaluation Protocol=Tr...
2026.02
69.5
52.5
DRAFT
Evaluation Protocol=Tr...
2026.02
68.1
50
D0
Evaluation Protocol=Tr...
2026.02
67.3
48
D1
Evaluation Protocol=Tr...
2026.02
66.5
49.4
Feedback
Search any
task
Search any
task