Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Efficient Post-training of LLMs for Code Generation With Offline Reinforcement Learning

About

Post-training using online reinforcement learning (RL) is an important training step for LLMs, including code-generating models. However, online RL for code generation involves LLM inference and verification of the generated output, which can take considerable time and resources. In this paper, we explore the application of offline RL to code-generating models by leveraging existing code datasets. Our experiments demonstrate that offline RL is an effective training strategy for improving LLM performance. We show that offline RL can be especially beneficial for small LLMs and challenging coding problems.

Mingze Wu, Abhinav Anand, Shweta Verma, Mira Mezini• 2026

Related benchmarks

TaskDatasetResultRank
Code GenerationAPPS Introductory
pass@17.72
25
Code GenerationAPPS+ Interview
Pass@112.14
5
Code GenerationAPPS+ Competition
Pass@12.67
5
Code GenerationAPPS+
Pass@1 (Introductory)1.67
5
Showing 4 of 4 rows

Other info

Follow for update