RL Fine-Tuning Heals OOD Forgetting in SFT

About

Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) is a standard post-training recipe for improving Large Language Models (LLM) reasoning, but why it works remains unclear. We revisit the common claim that ``SFT memorizes, RL generalizes'' through checkpoint-wise analyses of in-distribution (ID) and out-of-distribution (OOD) reasoning. We find that OOD performance often peaks early during SFT and then declines despite continued improvement in ID reasoning. RL typically does not surpass this early SFT peak; rather, it restores OOD capability lost during later SFT, and only from a bounded range of SFT checkpoints. Further spectral analysis shows that this forgetting-and-recovery pattern correlates with rotations of singular vectors, while singular values remain largely stable. These findings suggest a more precise view of post-training dynamics: SFT can forget, RL can recover, and controlling singular-vector rotation may improve OOD robustness. Code is available at \href{https://github.com/jinhangzhan/RL\_Heals\_SFT.git}{https://github.com/jinhangzhan/RL\_Heals\_SFT}.

Hangzhan Jin, Sitao Luan, Tianwei Ni, Sicheng Lyu, Guillaume Rabusseau, Reihaneh Rabbany, Doina Precup, Mohammad Hamdaqa• 2025

Related benchmarks

Task	Dataset	Result
General Knowledge	MMLU-Pro	Accuracy73	67
Reasoning	ARC Challenge	Accuracy (ARC)0.92	56
Instruction Following	IFEval loose	--	18

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord