Share your thoughts, 1 month free Claude Pro on usSee more

Agentic Reasoning on Terminal Bench Core 2.0

37.5Success Rate

Qwen3.5-122B-A10B

Updated 3mo ago

Evaluation Results

Method	Links
Qwen3.5-122B-A10B 2026.04		37.5
Nemotron 3 Super 2026.04		31
GPT-OSS-120B 2026.04		18.7