Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Reasoning on Terminal Bench hard
Loading...
26.8
Success Rate
Qwen3.5-122B-A10B
23.888
24.644
25.4
26.156
Apr 14, 2026
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
Qwen3.5-122B-A10B
2026.04
26.8
Nemotron 3 Super
2026.04
25.78
GPT-OSS-120B
2026.04
24
Feedback
Search any
task
Search any
task