Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Success Prediction on SWE-bench Verified (held-out agent data)
Loading...
0.949
AUC-ROC
Oracle
0.84084
0.86892
0.897
0.92508
Apr 1, 2026
AUC-ROC
Updated 17d ago
Evaluation Results
Method
Method
Links
AUC-ROC
Oracle
2026.04
0.949
IRT-Agent
2026.04
0.936
Baseline
2026.04
0.845
Feedback
Search any
task
Search any
task