Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

About

This paper presents LLaDA2.0 -- a tuple of discrete diffusion large language models (dLLM) scaling up to 100B total parameters through systematic conversion from auto-regressive (AR) models -- establishing a new paradigm for frontier-scale deployment. Instead of costly training from scratch, LLaDA2.0 upholds knowledge inheritance, progressive adaption and efficiency-aware design principle, and seamless converts a pre-trained AR model into dLLM with a novel 3-phase block-level WSD based training scheme: progressive increasing block-size in block diffusion (warm-up), large-scale full-sequence diffusion (stable) and reverting back to compact-size block diffusion (decay). Along with post-training alignment with SFT and DPO, we obtain LLaDA2.0-mini (16B) and LLaDA2.0-flash (100B), two instruction-tuned Mixture-of-Experts (MoE) variants optimized for practical deployment. By preserving the advantages of parallel decoding, these models deliver superior performance and efficiency at the frontier scale. Both models were open-sourced.

Tiwei Bie, Maosong Cao, Kun Chen, Lun Du, Mingliang Gong, Zhuochen Gong, Yanmei Gu, Jiaqi Hu, Zenan Huang, Zhenzhong Lan, Chengxi Li, Chongxuan Li, Jianguo Li, Zehuan Li, Huabin Liu, Lin Liu, Guoshan Lu, Xiaocheng Lu, Yuxin Ma, Jianfeng Tan, Lanning Wei, Ji-Rong Wen, Yipeng Xing, Xiaolu Zhang, Junbo Zhao, Da Zheng, Jun Zhou, Junlin Zhou, Zhanchao Zhou, Liwang Zhu, Yihong Zhuang• 2025

Related benchmarks

TaskDatasetResultRank
Instruction FollowingIFEval
IFEval Accuracy82.6
625
Code GenerationHumanEval (test)--
506
Code GenerationMBPP (test)--
298
Common Sense ReasoningHellaSwag
Accuracy82.35
213
MathGSM8K
Accuracy0.8848
206
ReasoningHellaSwag (HS)
HellaSwag Accuracy84.97
162
General ReasoningMMLU
MMLU Accuracy72.54
156
ReasoningPIQA
Accuracy96.5
145
General ReasoningMMLU-Pro
Accuracy57.1
114
Text-to-SQLSpider
Exec Acc (All)82.49
91
Showing 10 of 71 rows
...

Other info

GitHub

Follow for update