Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting

About

Adapting language models (LMs) to new tasks via post-training carries the risk of degrading existing capabilities -- a phenomenon classically known as catastrophic forgetting. In this paper, toward identifying guidelines for mitigating this phenomenon, we systematically compare the forgetting patterns of two widely adopted post-training methods: supervised fine-tuning (SFT) and reinforcement learning (RL). Our experiments reveal a consistent trend across LM families (Llama, Qwen) and tasks (instruction following, general knowledge, and arithmetic reasoning): RL leads to less forgetting than SFT while achieving comparable or higher target task performance. To investigate the cause for this difference, we consider a simplified setting in which the LM is modeled as a mixture of two distributions, one corresponding to prior knowledge and the other to the target task. We identify that the mode-seeking nature of RL, which stems from its use of on-policy data, enables keeping prior knowledge intact when learning the target task. We then verify this insight by demonstrating that the use on-policy data underlies the robustness of RL to forgetting in practical settings, as opposed to other algorithmic choices such as the KL regularization or advantage estimation. Lastly, as a practical implication, our results highlight the potential of mitigating forgetting using approximately on-policy data, which can be substantially more efficient to obtain than fully on-policy data.

Howard Chen, Noam Razin, Karthik Narasimhan, Danqi Chen• 2025

Related benchmarks

Task	Dataset	Result
Instruction Following	IFEval	IFEval Accuracy52.6	854
Mathematical Reasoning	Countdown	Accuracy29	252
Coding	MBPP	Accuracy56.9	175
Coding	HumanEval	Pass@171.3	168
Coding	HumanEval+	Pass@164.6	164
Coding	MBPP+	Pass@163.8	117
Instruction Following	IFEval (test)	--	92
Mathematical Reasoning	COUNTDOWN (test)	Accuracy20.9	84
Coding	HumanEval	Accuracy57.1	84
Coding	MBPP	Pass@1 Accuracy74.1	78

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord