Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Command-line Interface Tasks on Terminal-Bench 2.0
Loading...
57.3
Terminus2 JSON Score
Claude-Opus-4.5
31.612
38.281
44.95
51.619
Feb 28, 2026
Terminus2 JSON Score
Terminus2 XML Score
ClaudeCode Score
QwenCode Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Terminus2 JSON Score
Terminus2 XML Score
ClaudeCode Score
QwenCode Score
Claude-Opus-4.5
Size=?
2026.02
57.3
58.4
53.9
51.7
Claude-Sonnet-4.5
Size=?
2026.02
51.7
51.7
41.6
37.1
Kimi-K2.5
Size=1000A32
2026.02
49.4
38.8
9
27.5
DeepSeek-V3.2
Size=671A37
2026.02
39.3
34.8
-
-
GLM-4.7
Size=358A32
2026.02
37.1
44.9
-
31.5
Qwen3-Coder-Next
Size=80A3
2026.02
36.2
34.2
30.9
25.8
MiniMax-M2.1
Size=230A10
2026.02
32.6
-
42.7
39.3
Feedback
Search any
task
Search any
task