Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool Use on StableToolBench G1 Category
Loading...
76.8
SL
Trace-Based
67.44
69.87
72.3
74.73
Feb 23, 2026
SL
QL
Updated 4d ago
Evaluation Results
Method
Method
Links
SL
QL
Trace-Based
Evaluation Protocol=Tr...
2026.02
76.8
64.5
D2
Evaluation Protocol=Tr...
2026.02
75.9
66.4
D1
Evaluation Protocol=Tr...
2026.02
75.5
64.9
Play2Prompt
Evaluation Protocol=Tr...
2026.02
74.6
66.2
DRAFT
Evaluation Protocol=Tr...
2026.02
73.2
60.4
D0
Evaluation Protocol=Tr...
2026.02
73
62.4
Play2Prompt
Evaluation Protocol=Tr...
2026.02
71.3
55.2
D2
Evaluation Protocol=Tr...
2026.02
71.1
57.3
D0
Evaluation Protocol=Tr...
2026.02
71
52.8
Trace-Based
Evaluation Protocol=Tr...
2026.02
69.1
54.7
D1
Evaluation Protocol=Tr...
2026.02
68
53.7
DRAFT
Evaluation Protocol=Tr...
2026.02
67.8
48.5
Feedback
Search any
task
Search any
task