dLLM: Simple Diffusion Language Modeling

About

Although diffusion language models (DLMs) are evolving quickly, many recent models converge on a set of shared components. These components, however, are distributed across ad-hoc research codebases or lack transparent implementations, making them difficult to reproduce or extend. As the field accelerates, there is a clear need for a unified framework that standardizes these common components while remaining flexible enough to support new methods and architectures. To address this gap, we introduce dLLM, an open-source framework that unifies the core components of diffusion language modeling -- training, inference, and evaluation -- and makes them easy to customize for new designs. With dLLM, users can reproduce, finetune, deploy, and evaluate open-source large DLMs such as LLaDA and Dream through a standardized pipeline. The framework also provides minimal, reproducible recipes for building small DLMs from scratch with accessible compute, including converting any BERT-style encoder or autoregressive LM into a DLM. We also release the checkpoints of these small DLMs to make DLMs more accessible and accelerate future research.

Zhanhui Zhou, Lingjie Chen, Hanghang Tong, Dawn Song• 2026

Related benchmarks

Task	Dataset	Result
Language Understanding	MMLU-Pro	Accuracy13.8	130
Code Generation	HumanEval	Score32.3	78
Scientific Reasoning	GPQA Diamond	Accuracy22.2	73
Reasoning	BBH	BBH Score41.5	53
Language Understanding	MMLU-Pro	MMLU-Pro Score24.7	44
Mathematical Reasoning	MATH	Overall Score32.4	43
Language Understanding	MMLU	MMLU Score52.8	40
Code Generation	MBPP Base	Pass@154	12
Code Generation	HumanEval Base	Pass@145.7	12

Showing 9 of 9 rows

Other info

GitHub

Follow for update

@wizwand_team Discord