Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Terminal Command Execution on TerminalBench
Loading...
98.1
Success Rate
LLM-only
80.42
85.01
89.6
94.19
May 15, 2026
Success Rate
LLM Utilization (%)
Updated 15d ago
Evaluation Results
Method
Method
Links
Success Rate
LLM Utilization (%)
LLM-only
Backbones=Averaged ove...
2026.05
98.1
100
Oracle Router
Backbones=Averaged ove...
2026.05
97.8
18.4
Heuristic Router
Backbones=Averaged ove...
2026.05
95
66
R2V
Backbones=Averaged ove...
2026.05
93.3
33.9
Entropy Router
Backbones=Averaged ove...
2026.05
87.4
4.1
SLM-only
Backbones=Averaged ove...
2026.05
81.1
0
Feedback
Search any
task
Search any
task