Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Terminal Task Execution on Terminal-Bench 1.0
Loading...
51
Accuracy
Claude Sonnet 4.5
15.744
24.897
34.05
43.203
Apr 28, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Claude Sonnet 4.5
Model Category=Proprie...
2026.04
51
Claude Opus 4.1
Model Category=Proprie...
2026.04
43.8
Claude Haiku 4.5
Model Category=Proprie...
2026.04
41.8
GPT-5
Model Category=Proprie...
2026.04
41.3
Claude Opus 4
Model Category=Proprie...
2026.04
39
Grok 4
Model Category=Open-So...
2026.04
39
Claude Sonnet 4
Model Category=Proprie...
2026.04
36.4
Qwen3-32B + SS
Model Category=Open-So...
2026.04
33.8
Grok 4 Fast
Model Category=Open-So...
2026.04
31.3
GPT-5-Mini
Model Category=Proprie...
2026.04
30.8
Qwen3-14B + SS
Model Category=Open-So...
2026.04
22.9
Qwen3-8B + SS
Model Category=Open-So...
2026.04
17.1
Feedback
Search any
task
Search any
task