Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
About
Diffusion-based language models (DLLMs) offer non-sequential, block-wise generation and richer data reuse compared to autoregressive (AR) models, but existing code DLLMs still lag behind strong AR baselines under comparable budgets. We revisit this setting in a controlled study and introduce Stable-DiffCoder, a block diffusion code model that reuses the Seed-Coder architecture, data, and training pipeline. To enable efficient knowledge learning and stable training, we incorporate a block diffusion continual pretraining (CPT) stage enhanced by a tailored warmup and block-wise clipped noise schedule. Under the same data and architecture, Stable-DiffCoder overall outperforms its AR counterpart on a broad suite of code benchmarks. Moreover, relying only on the CPT and supervised fine-tuning stages, Stable-DiffCoder achieves stronger performance than a wide range of \~8B ARs and DLLMs, demonstrating that diffusion-based training can improve code modeling quality beyond AR training alone. Moreover, diffusion-based any-order modeling improves structured code modeling for editing and reasoning, and through data augmentation, benefits low-resource coding languages.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | HumanEval (test) | -- | 444 | |
| Code Generation | MBPP (test) | -- | 276 | |
| Code Generation | MBPP | Pass@142.4 | 175 | |
| Function-level Code Generation | HumanEval+ augmented (test) | Pass@182.3 | 46 | |
| Function-level Code Generation | MBPP+ augmented (test) | Pass@172.8 | 45 | |
| Code Generation | BigCodeBench-Completion Full | pass@154.8 | 41 | |
| Code Generation | BigCodeBench-Completion Hard | pass@131.8 | 38 | |
| CUDA Kernel Generation | KernelBench Level 1 | Exec Count27 | 31 | |
| CUDA Kernel Generation | KernelBench Level 3 | Executions Count10 | 31 | |
| CUDA Kernel Generation | KernelBench Level 2 | Execution Count5 | 31 |