Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Repair on SWE-bench Lite
Loading...
0.77
r
ADARUBRIC-DA
0.4996
0.5698
0.64
0.7102
Mar 22, 2026
r
Alpha (α)
Resolution Rate (%)
Updated 25d ago
Evaluation Results
Method
Method
Links
r
Alpha (α)
Resolution Rate (%)
ADARUBRIC-DA
Backbone=Llama-3.1-8B-...
2026.03
0.77
0.84
14.7
ADARUBRIC-WM
Backbone=Llama-3.1-8B-...
2026.03
0.72
0.82
12.4
GPT-4 Direct
Backbone=Llama-3.1-8B-...
2026.03
0.59
0.68
9.8
Prometheus
Backbone=Llama-3.1-8B-...
2026.03
0.56
0.7
9.1
G-Eval (GPT-4o)
Backbone=Llama-3.1-8B-...
2026.03
0.51
0.63
8.2
Feedback
Search any
task
Search any
task