Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Task Completion on Terminal-Bench All 2
Loading...
77
Pass@1
AHE
46.008
54.054
62.1
70.146
Apr 28, 2026
Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
AHE
Base Model=GPT-5.4 (hi...
2026.04
77
TF-GRPO
Base Model=GPT-5.4 (hi...
2026.04
72.3
Codex
Base Model=GPT-5.4 (hi...
2026.04
71.9
NexAU0
Base Model=GPT-5.4 (hi...
2026.04
69.7
ACE
Base Model=GPT-5.4 (hi...
2026.04
68.9
terminus-2
Base Model=GPT-5.4 (hi...
2026.04
62.9
opencode
Base Model=GPT-5.4 (hi...
2026.04
47.2
Feedback
Search any
task
Search any
task