Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models

About

Reinforcement learning for program repair is hindered by sparse execution feedback and coarse sequence-level rewards that obscure which edits actually fix bugs. We present BoostAPR, a three-stage framework addressing these challenges: (1) supervised fine-tuning on execution-verified demonstrations with reasoning traces, (2) training dual reward models--a sequence-level assessor and a line-level credit allocator--from execution outcomes, and (3) PPO optimization where the line-level model redistributes rewards to critical edit regions. This line-level credit assignment operates at an intermediate granularity naturally suited to code changes. Trained on SWE-Gym and evaluated on four benchmarks, BoostAPR achieves 40.7% on SWE-bench Verified (+22.9pp over base model), 24.8% on Defects4J (Python-to-Java transfer), 84.5% on HumanEval-Java, and 95.0% on QuixBugs, achieving competitive results among open-source models with strong cross-language generalization.

Yuanhao Li, Hongbo Wang, Xiaotang Shang, Xunzhu Tang, Yiming Cao, Xuhong Chen• 2026

Related benchmarks

TaskDatasetResultRank
Automated Program RepairDefects4J 835 bugs v2.0
Pass@124.8
16
Automated Program RepairHumanEval Java (164 tasks)
Pass@1 Rate84.5
16
Automated Program RepairSWE-bench Verified 500 instances
Pass@1 Rate40.7
16
Automated Program RepairQuixBugs-Java 40 bugs
Pass@1 Rate95
16
Showing 4 of 4 rows

Other info

Follow for update