Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Automated Program Repair on HumanEval Java (164 tasks)
Loading...
84.5
Pass@1 Rate
BOOSTAPR (+ Rline)
57.46
64.48
71.5
78.52
May 9, 2026
Pass@1 Rate
Updated 22d ago
Evaluation Results
Method
Method
Links
Pass@1 Rate
BOOSTAPR (+ Rline)
Params=32B
2026.05
84.5
+ Stage III (PPO, Rseq only)
Params=32B
2026.05
79.4
SWE-RL
Backbone=Llama-3-70B,...
2026.05
76.2
RLEF
Backbone=Llama-3-8B, P...
2026.05
74.3
SWE-Fixer
Backbone=Qwen2.5-72B,...
2026.05
73.8
+ Stage I (SFT)
Params=32B
2026.05
73.1
Lingma SWE-GPT
Backbone=Qwen2.5-72B,...
2026.05
72.6
ChatRepair
Backbone=GPT-3.5-turbo
2026.05
72
Agentless
Backbone=GPT-4o
2026.05
71.3
SWE-Gym
Backbone=Qwen2.5-Coder...
2026.05
70.7
SWE-agent
Backbone=Claude 3.5 So...
2026.05
68.9
RepairLLaMA
Backbone=CodeLlama-7B,...
2026.05
66.5
AutoCodeRover
Backbone=GPT-4o
2026.05
65.2
Qwen2.5-Coder-32B (base)
Params=32B
2026.05
64
CodeRL
Backbone=CodeT5-large,...
2026.05
63
KNOD
Backbone=CodeT5-base,...
2026.05
58.5
Feedback
Search any
task
Search any
task