Stabilizing LLM Supervised Fine-Tuning via Explicit Distributional Control

About

Post-training large language models (LLMs) often suffers from catastrophic forgetting, where improvements on a target objective degrade previously acquired capabilities. Recent evidence suggests that this phenomenon is primarily driven by excessive distributional drift during optimization. Motivated by this perspective, we propose Anchored Learning, a simple framework that explicitly controls distributional updates during offline fine-tuning via a dynamically evolving moving anchor. Instead of matching a fixed reference distribution, the anchor interpolates between the current model and a frozen reference to construct an intermediate target that the model distills toward, transforming global fine-tuning into a sequence of local trust-region updates in distribution space. Theoretically, we prove this anchor-based update admits a linear KL-divergence upper bound per iteration, ensuring a stable transition between model distributions. Extensive experiments on iGSM, MedCalc, and IFEval show that Anchored Learning consistently lies on the Pareto frontier of gain-stability trade-offs, achieving near-optimal performance improvements while substantially reducing degradation compared to strong baselines. For example, while standard SFT suffers from over 53% performance degradation on iGSM and MedCalc, Anchored Learning slashes this drop to under 5% while maintaining near-optimal gains (e.g., 75.2% on iGSM).

Xinyu Wang, Changzhi Sun, Yuanbin Wu, Xiaoling Wang• 2026

Related benchmarks

Task	Dataset	Result
Instruction Following	IFEval	IFEval Accuracy60	854
Mathematical Reasoning	Countdown	Accuracy18.4	252
Coding	MBPP	Accuracy58.7	175
Coding	HumanEval	Pass@170.7	168
Coding	HumanEval+	Pass@166.5	164
Coding	MBPP+	Pass@161.1	117
Instruction Following	IFEval (test)	--	92
Coding	HumanEval	Accuracy57.3	84
Mathematical Reasoning	COUNTDOWN (test)	Accuracy17	84
Coding	MBPP	Pass@1 Accuracy70.4	78

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord