Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

About

We introduce Lumina-DiMOO, an open-source foundational model for seamless multi-modal generation and understanding. Lumina-DiMOO sets itself apart from prior unified models by utilizing a fully discrete diffusion modeling to handle inputs and outputs across various modalities. This innovative approach allows Lumina-DiMOO to achieve higher sampling efficiency compared to previous autoregressive (AR) or hybrid AR-Diffusion paradigms and adeptly support a broad spectrum of multi-modal tasks, including text-to-image generation, image-to-image generation (e.g., image editing, subject-driven generation, and image inpainting, etc.), as well as image understanding. Lumina-DiMOO achieves state-of-the-art performance on multiple benchmarks, surpassing existing open-source unified multi-modal models. To foster further advancements in multi-modal and discrete diffusion model research, we release our code and checkpoints to the community. Project Page: https://synbol.github.io/Lumina-DiMOO.

Yi Xin, Qi Qin, Siqi Luo, Kaiwen Zhu, Juncheng Yan, Yan Tai, Jiayi Lei, Yuewen Cao, Keqi Wang, Yibin Wang, Jinbin Bai, Qian Yu, Dengyang Jiang, Yuandong Pu, Haoxing Chen, Le Zhuo, Junjun He, Gen Luo, Tianbin Li, Ming Hu, Jin Ye, Shenglong Ye, Bo Zhang, Chang Xu, Wenhai Wang, Hongsheng Li, Guangtao Zhai, Tianfan Xue, Bin Fu, Xiaohong Liu, Yu Qiao, Yihao Liu• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
Overall Score88
467
Text-to-Image GenerationGenEval
GenEval Score88
277
Text-to-Image GenerationDPG
Overall Score86.04
131
Text-to-Image GenerationGenEval
Two Objects94
87
Multimodal UnderstandingMMMU
MMMU Score41.4
78
Text-to-Image GenerationDPGBench
DPGBench Score86.04
31
Multimodal UnderstandingMMB
Score58.7
30
Multimodal UnderstandingSEED
SEED Score71.4
27
Text-to-Image GenerationUniGenBench
UniGenBench71.12
17
Reasoning-based Image EditingUniREditBench 44 (test)
Real World Score51.4
10
Showing 10 of 11 rows

Other info

GitHub

Follow for update