Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Task Completion on Terminal-Bench Easy 4 tasks 2
Loading...
100
Pass@1
TF-GRPO
74
80.75
87.5
94.25
Apr 28, 2026
Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
TF-GRPO
Base Model=GPT-5.4 (hi...
2026.04
100
AHE
Base Model=GPT-5.4 (hi...
2026.04
100
ACE
Base Model=GPT-5.4 (hi...
2026.04
91.7
NexAU0
Base Model=GPT-5.4 (hi...
2026.04
87.5
opencode
Base Model=GPT-5.4 (hi...
2026.04
75
terminus-2
Base Model=GPT-5.4 (hi...
2026.04
75
Codex
Base Model=GPT-5.4 (hi...
2026.04
75
Feedback
Search any
task
Search any
task