Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Success Prediction on Terminal-Bench 2.0 (held-out agent data)
Loading...
0.933
AUC-ROC
Oracle
0.83836
0.86293
0.8875
0.91207
Apr 1, 2026
AUC-ROC
Updated 17d ago
Evaluation Results
Method
Method
Links
AUC-ROC
Oracle
2026.04
0.933
IRT-Agent
2026.04
0.921
Baseline
2026.04
0.842
Feedback
Search any
task
Search any
task