Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Coding on Terminal-Bench 2.0
Loading...
59.3
Score
Claude Opus 4.5
31.74
38.895
46.05
53.205
Feb 17, 2026
Score
Verified Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
Verified Score
Claude Opus 4.5
Agent Framework=Termin...
2026.02
59.3
-
Claude Opus 4.5
Agent Framework=Claude...
2026.02
57.9
-
GLM-5
Agent Framework=Termin...
2026.02
56.2
60.7
GLM-5
Agent Framework=Claude...
2026.02
56.2
61.1
Gemini 3 Pro
Agent Framework=Termin...
2026.02
54.2
-
GPT-5.2 (xhigh)
Agent Framework=Termin...
2026.02
54
-
Kimi K2.5
Agent Framework=Termin...
2026.02
50.8
-
DeepSeek-V3.2
Agent Framework=Claude...
2026.02
46.4
-
GLM-4.7
Agent Framework=Termin...
2026.02
41
-
DeepSeek-V3.2
Agent Framework=Termin...
2026.02
39.3
-
GLM-4.7
Agent Framework=Claude...
2026.02
32.8
-
Feedback
Search any
task
Search any
task