Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Terminal Capability Evaluation on Terminal-Bench 2.0
Loading...
27.4
Accuracy
Nemotron-T-32B
2.128
8.689
15.25
21.811
Feb 24, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Nemotron-T-32B
Model Category=Open So...
2026.02
27.4
GPT-5-Mini
Model Category=Closed...
2026.02
24
Qwen3-Coder
Model Category=Open So...
2026.02
23.9
Grok 4
Model Category=Closed...
2026.02
23.1
Qwen3-Max-Thinking
Model Category=Closed...
2026.02
22.5
Nemotron-T-14B
Model Category=Open So...
2026.02
20.2
GPT-OSS (high) 120B
Model Category=Open So...
2026.02
18.7
Gemini 2.5 Flash
Model Category=Closed...
2026.02
16.9
Grok Code Fast 1
Model Category=Closed...
2026.02
14.2
GPT-5-Nano
Model Category=Closed...
2026.02
7.9
Qwen3-32B
Model Category=Open So...
2026.02
3.37
GPT-OSS (high) 20B
Model Category=Open So...
2026.02
3.1
Feedback
Search any
task
Search any
task