Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Terminal task completion on Terminal-bench 1.0
Loading...
51
Pass@1
Claude Sonnet 4.5
8.672
19.661
30.65
41.639
Feb 11, 2026
Pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
Claude Sonnet 4.5
OpenSource=false, Agen...
2026.02
51
LiberCoder-235B-A22B
OpenSource=true, Agent...
2026.02
46.1
Claude Opus 4.1
OpenSource=false, Agen...
2026.02
43.8
Claude Sonnet 4.5
OpenSource=false, Agen...
2026.02
42.7
Minimax-M2
OpenSource=true, Agent...
2026.02
42
Claude Haiku 4.5
OpenSource=false, Agen...
2026.02
41.8
Claude Sonnet 4
OpenSource=false, Agen...
2026.02
41.3
GLM-4.6
OpenSource=true, Agent...
2026.02
40.5
Grok 4
OpenSource=false, Agen...
2026.02
39
Qwen3-Coder-480B-A35B-Instruct
OpenSource=true, Agent...
2026.02
39
LiberCoder-32B
OpenSource=true, Agent...
2026.02
38.9
Qwen3-Coder-30B-A3B-Instruct
OpenSource=true, Agent...
2026.02
31.3
Kimi-K2-Instruct
OpenSource=true, Agent...
2026.02
30
Qwen3-Coder-30B-A3B-Instruct
OpenSource=true, Agent...
2026.02
26.5
Gemini 2.5 Pro
OpenSource=false, Agen...
2026.02
25.3
Qwen3-235B-A22B-Instruct
OpenSource=true, Agent...
2026.02
25
Qwen3-32B
OpenSource=true, Agent...
2026.02
10.3
Feedback
Search any
task
Search any
task