Near-Policy: Accelerating On-Policy Distillation via Asynchronous Generation and Selective Packing

About

Standard knowledge distillation for autoregressive models often suffers from distribution mismatch. While on-policy methods mitigate this by leveraging student-generated outputs, they rely on computationally expensive Reinforcement Learning (RL) frameworks. To improve efficiency, we propose Near-Policy Distillation (NPD), an asynchronous approach that decouples student generation from training. This reformulation enables Supervised Fine-Tuning (SFT) with sequence packing. However, asynchronous updates inevitably introduce policy lag and sample noise, which can cause the behavior to drift from near-policy toward off-policy. To counteract this without sacrificing efficiency, NPD integrates sparse student updates and the $\Delta$-IFD filtering mechanism, a heuristic sample selection mechanism that empirically stabilizes the optimization trajectory. By filtering extreme out-of-distribution samples, $\Delta$-IFD prevents noise from dominating the gradients, ensuring updates remain within a safe proximal learning zone. Empirically, the NPD framework achieves a 8.1x speedup over on-policy baselines and outperforms SFT by 8.09%. Crucially, by effectively narrowing the exploration space for subsequent RL, our method enables openPangu-Embedded-1B to reach a state-of-the-art score of 68.73%, outperforming the substantially larger Qwen3-1.7B. Codes will be released soon.

Miao Rang, Zhenni Bi, Hang Zhou, Kai Han, Xuechun Wang, An Xiao, Xinghao Chen, Yunhe Wang, Hanting Chen• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH 500	Accuracy (Acc)84.76	600
Language Understanding	CEval	Accuracy67.13	67
Language Understanding	CMMLU	Accuracy56.53	65
Reading Comprehension	DROP	F1 Score69.18	35
Instruction Following	IF-Eval	Prompt Strict Accuracy65.43	22
Expert-Level Reasoning	GPQA Diamond	Pass@1 Score50.51	20
Overall Evaluation	Aggregate	Average Score68.73	9
Winograd Schema Challenge	CLUEWSC	Accuracy82.87	9

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord