Share your thoughts, 1 month free Claude Pro on usSee more

Terminal Capability Evaluation on Terminal-Bench 2.0

27.4Accuracy

Nemotron-T-32B

Updated 1mo ago

Evaluation Results

Method	Links
Nemotron-T-32B 2026.02		27.4
GPT-5-Mini 2026.02		24
Qwen3-Coder 2026.02		23.9
Grok 4 2026.02		23.1
Qwen3-Max-Thinking 2026.02		22.5
Nemotron-T-14B 2026.02		20.2
GPT-OSS (high) 120B 2026.02		18.7
Gemini 2.5 Flash 2026.02		16.9
Grok Code Fast 1 2026.02		14.2
GPT-5-Nano 2026.02		7.9
Qwen3-32B 2026.02		3.37
GPT-OSS (high) 20B 2026.02		3.1