Large Language Diffusion Models

About

The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA employs a forward data masking process and a reverse generation process, parameterized by a Transformer to predict masked tokens. It provides a principled generative approach for probabilistic inference by optimizing a likelihood lower bound. Across extensive benchmarks on general tasks, math, code, and so on, LLaDA demonstrates strong scalability and performs comparably to our self-constructed ARM baselines. Remarkably, LLaDA 8B is competitive with strong LLMs like LLaMA3 8B in in-context learning and, after SFT, exhibits impressive instruction-following abilities in case studies such as multi-turn dialogue. Moreover, LLaDA addresses the reversal curse, surpassing GPT-4o in a reversal poem completion task. Our findings show the promise of diffusion models for language modeling at scale and challenge the common assumption that core LLM capabilities discussed above inherently depend on ARMs. Project page and codes: https://ml-gsai.github.io/LLaDA-demo/.

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, Chongxuan Li• 2025

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	WinoGrande	Accuracy75.8	1442
Mathematical Reasoning	GSM8K	Accuracy78.2	1398
Automatic Speech Recognition	LibriSpeech clean (test)	WER2.34	1207
Automatic Speech Recognition	LibriSpeech (test-other)	WER5.22	1206
Code Generation	HumanEval	Pass@145.12	1043
Mathematical Reasoning	GSM8K (test)	Accuracy78.2	954
Mathematical Reasoning	MATH500 (test)	Accuracy36.8	895
Language Understanding	MMLU	Accuracy65.9	844
Instruction Following	IFEval	IFEval Accuracy59.9	836
Commonsense Reasoning	PIQA	Accuracy74.4	757

Showing 10 of 408 rows

...

Other info

Follow for update

@wizwand_team Discord