Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

About

Diffusion-based language models (DLLMs) offer non-sequential, block-wise generation and richer data reuse compared to autoregressive (AR) models, but existing code DLLMs still lag behind strong AR baselines under comparable budgets. We revisit this setting in a controlled study and introduce Stable-DiffCoder, a block diffusion code model that reuses the Seed-Coder architecture, data, and training pipeline. To enable efficient knowledge learning and stable training, we incorporate a block diffusion continual pretraining (CPT) stage enhanced by a tailored warmup and block-wise clipped noise schedule. Under the same data and architecture, Stable-DiffCoder overall outperforms its AR counterpart on a broad suite of code benchmarks. Moreover, relying only on the CPT and supervised fine-tuning stages, Stable-DiffCoder achieves stronger performance than a wide range of \~8B ARs and DLLMs, demonstrating that diffusion-based training can improve code modeling quality beyond AR training alone. Moreover, diffusion-based any-order modeling improves structured code modeling for editing and reasoning, and through data augmentation, benefits low-resource coding languages.

Chenghao Fan, Wen Heng, Bo Li, Sichen Liu, Yuxuan Song, Jing Su, Xiaoye Qu, Kai Shen, Wei Wei• 2026

Related benchmarks

Task	Dataset	Result
Code Generation	HumanEval (test)	--	612
Code Generation	MBPP (test)	--	405
Code Generation	MBPP	Pass@142.4	193
Function-level Code Generation	HumanEval+ augmented (test)	Pass@182.3	65
Function-level Code Generation	MBPP+ augmented (test)	Pass@172.8	56
Code Reasoning	CRUXEval	Input-CoT Accuracy62.1	56
Code Generation	MultiPL-E	Average Score71.2	47
Code Generation	BigCodeBench-Completion Full	pass@154.8	41
Code Generation	BigCodeBench-Completion Hard	pass@131.8	38
CUDA Kernel Generation	KernelBench Level 1	Exec Count27	31

Showing 10 of 17 rows

Other info

GitHub

Follow for update

@wizwand_team Discord