Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool-use instruction following on ToolBench v1 (Evaluation)
Loading...
71.7
Success Rate
AgentMark
49.132
54.991
60.85
66.709
Jan 5, 2026
Success Rate
Total Steps
BPS Rate
BPT Rate
Delta Steps per Step
Delta Tokens per Step (%)
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
Total Steps
BPS Rate
BPT Rate
Delta Steps per Step
Delta Tokens per Step (%)
AgentMark
Task=T4: Multi-tool wi...
2026.01
71.7
6.5
0.49
5.1
-1.54
10.97
AgentMark
Task=T2: Single-tool i...
2026.01
61.7
8.6
0.48
4.89
-1.41
-16.96
AgentMark
Task=T1: Single-tool s...
2026.01
60
5.6
0.51
5.28
-1.67
9.64
AgentMark
Task=T3: Single-tool w...
2026.01
60
4.8
0.46
4.62
-2.4
-3.29
AgentMark
Task=Avg.
2026.01
59.7
7.2
0.49
4.93
-1.27
-6.25
AgentMark
Task=T6: Complex multi...
2026.01
55
9.6
0.49
4.9
0.24
-14.77
AgentMark
Task=T5: Multi-tool in...
2026.01
50
7.8
0.48
4.85
-0.83
5.24
Feedback
Search any
task
Search any
task