Share your thoughts, 1 month free Claude Pro on usSee more

Agentic Task Completion on Terminal-Bench Med. 2 (55 tasks)

88.2Pass@1

AHE

Updated 2mo ago

Evaluation Results

Method	Links
AHE 2026.04		88.2
Codex 2026.04		80
TF-GRPO 2026.04		79.4
NexAU0 2026.04		78.2
ACE 2026.04		78.2
terminus-2 2026.04		74.5
opencode 2026.04		52.7