Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool Use on Simulated 120-tool benchmark 500 tasks 1.0
Loading...
480
Tokens per Turn
B4 CLI Lazy
-1,393.28
11,251.36
23,896
36,540.64
Apr 23, 2026
Tokens per Turn
Rho T30 Score
Success Rate
P50 Latency (s)
P95 Latency (s)
Cost per Task ($)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Tokens per Turn
Rho T30 Score
Success Rate
P50 Latency (s)
P95 Latency (s)
Cost per Task ($)
B4 CLI Lazy
2026.04
480
0.94
88
2.4
5.4
0.03
Tool Attention
2026.04
2,368
0.91
94
2
4.3
0.03
B3 Simple Retrieval
2026.04
4,082
0.78
81
2.2
4.6
0.04
B2 Static Pruning
2026.04
11,865
0.56
58
3.8
7.1
0.09
B1 Full-Schema
2026.04
47,312
0.24
72
4.2
7.9
0.21
Feedback
Search any
task
Search any
task