LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

About

We present LLaDA2.0-Uni, a unified discrete diffusion large language model (dLLM) that supports multimodal understanding and generation within a natively integrated framework. Its architecture combines a fully semantic discrete tokenizer, a MoE-based dLLM backbone, and a diffusion decoder. By discretizing continuous visual inputs via SigLIP-VQ, the model enables block-level masked diffusion for both text and vision inputs within the backbone, while the decoder reconstructs visual tokens into high-fidelity images. Inference efficiency is enhanced beyond parallel decoding through prefix-aware optimizations in the backbone and few-step distillation in the decoder. Supported by carefully curated large-scale data and a tailored multi-stage training pipeline, LLaDA2.0-Uni matches specialized VLMs in multimodal understanding while delivering strong performance in image generation and editing. Its native support for interleaved generation and reasoning establishes a promising and scalable paradigm for next-generation unified foundation models. Codes and models are available at https://github.com/inclusionAI/LLaDA2.0-Uni.

Inclusion AI, Tiwei Bie, Haoxing Chen, Tieyuan Chen, Zhenglin Cheng, Long Cui, Kai Gan, Zhicheng Huang, Zhenzhong Lan, Haoquan Li, Jianguo Li, Tao Lin, Qi Qin, Hongjun Wang, Xiaomei Wang, Haoyuan Wu, Yi Xin, Junbo Zhao• 2026

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	GenEval	Overall Score89	914
Multimodal Understanding	MMStar	--	511
Optical Character Recognition	OCRBench	Score75.7	486
Mathematical Reasoning	WeMath	--	317
Multimodal Understanding	MMBench CN	--	302
Document Visual Question Answering	DocVQA	ANLS89.5	301
Text-to-Image Generation	DPG	Overall Score87.76	270
Visual Question Answering	SimpleVQA	Accuracy0.44	225
Multimodal Reasoning	MMMU (val)	Accuracy50.1	168
Infographic Question Answering	InfoVQA	ANLS70.1	117

Showing 10 of 30 rows

Other info

GitHub

Follow for update

@wizwand_team Discord