Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Task Completion on Terminal-Bench Med. 2 (55 tasks)
Loading...
88.2
Pass@1
AHE
51.28
60.865
70.45
80.035
Apr 28, 2026
Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
AHE
Base Model=GPT-5.4 (hi...
2026.04
88.2
Codex
Base Model=GPT-5.4 (hi...
2026.04
80
TF-GRPO
Base Model=GPT-5.4 (hi...
2026.04
79.4
NexAU0
Base Model=GPT-5.4 (hi...
2026.04
78.2
ACE
Base Model=GPT-5.4 (hi...
2026.04
78.2
terminus-2
Base Model=GPT-5.4 (hi...
2026.04
74.5
opencode
Base Model=GPT-5.4 (hi...
2026.04
52.7
Feedback
Search any
task
Search any
task