Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
End-to-end terminal tasks on Terminal-Bench 2
Loading...
49.6
Score
GPT-5
-1.984
11.408
24.8
38.192
Dec 4, 2025
Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
GPT-5
Model Type=Proprietary
2025.12
49.6
Claude-Sonnet-4.5
Model Type=Proprietary
2025.12
42.8
Kimi-K2-thinking
Model Type=Open Source...
2025.12
35.7
Gemini-2.5-pro
Model Type=Proprietary
2025.12
32.6
DeepSeek-V3.1-Nex-N1
Model Type=Open Source...
2025.12
31.8
Minimax-M2
Model Type=Open Source...
2025.12
30
GLM-4.6
Model Type=Open Source...
2025.12
24.5
DeepSeek-V3.1
Model Type=Open Source...
2025.12
22.2
Qwen3-32B-Nex-N1
Model Type=Open Source...
2025.12
16.7
Qwen3-30B-A3B-Nex-N1
Model Type=Open Source...
2025.12
8.3
Qwen3-32B
Model Type=Open Source...
2025.12
7.9
Qwen3-30B-A3B
Model Type=Open Source...
2025.12
6
InternLM3-8B-Nex-N1
Model Type=Open Source...
2025.12
0
Feedback
Search any
task
Search any
task