Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool Use on StableToolBench G2 Category
Loading...
71
SL
Trace-Based
66.112
67.381
68.65
69.919
Feb 23, 2026
SL
QL
Updated 4d ago
Evaluation Results
Method
Method
Links
SL
QL
Trace-Based
Evaluation Protocol=Tr...
2026.02
71
50.7
D2
Evaluation Protocol=Tr...
2026.02
70.7
47.1
D0
Evaluation Protocol=Tr...
2026.02
68.4
42.4
D1
Evaluation Protocol=Tr...
2026.02
67.9
43.1
Play2Prompt
Evaluation Protocol=Tr...
2026.02
66.7
42.4
DRAFT
Evaluation Protocol=Tr...
2026.02
66.3
45.3
Feedback
Search any
task
Search any
task