Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Automated Program Repair on SWE-bench Verified 500 instances
Loading...
41
Pass@1 Rate
SWE-RL
0.856
11.278
21.7
32.122
May 9, 2026
Pass@1 Rate
Updated 22d ago
Evaluation Results
Method
Method
Links
Pass@1 Rate
SWE-RL
Backbone=Llama-3-70B,...
2026.05
41
BOOSTAPR (+ Rline)
Params=32B
2026.05
40.7
Agentless
Backbone=GPT-4o
2026.05
38.8
+ Stage III (PPO, Rseq only)
Params=32B
2026.05
38.3
SWE-agent
Backbone=Claude 3.5 So...
2026.05
33.6
SWE-Fixer
Backbone=Qwen2.5-72B,...
2026.05
33
SWE-Gym
Backbone=Qwen2.5-Coder...
2026.05
32
Lingma SWE-GPT
Backbone=Qwen2.5-72B,...
2026.05
30.2
AutoCodeRover
Backbone=GPT-4o
2026.05
28.8
+ Stage I (SFT)
Params=32B
2026.05
23.4
ChatRepair
Backbone=GPT-3.5-turbo
2026.05
18.2
Qwen2.5-Coder-32B (base)
Params=32B
2026.05
17.8
RLEF
Backbone=Llama-3-8B, P...
2026.05
12.6
RepairLLaMA
Backbone=CodeLlama-7B,...
2026.05
8.6
CodeRL
Backbone=CodeT5-large,...
2026.05
3.2
KNOD
Backbone=CodeT5-base,...
2026.05
2.4
Feedback
Search any
task
Search any
task