Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

About

We introduce Lumina-Image 2.0, an advanced text-to-image generation framework that achieves significant progress compared to previous work, Lumina-Next. Lumina-Image 2.0 is built upon two key principles: (1) Unification - it adopts a unified architecture (Unified Next-DiT) that treats text and image tokens as a joint sequence, enabling natural cross-modal interactions and allowing seamless task expansion. Besides, since high-quality captioners can provide semantically well-aligned text-image training pairs, we introduce a unified captioning system, Unified Captioner (UniCap), specifically designed for T2I generation tasks. UniCap excels at generating comprehensive and accurate captions, accelerating convergence and enhancing prompt adherence. (2) Efficiency - to improve the efficiency of our proposed model, we develop multi-stage progressive training strategies and introduce inference acceleration techniques without compromising image quality. Extensive evaluations on academic benchmarks and public text-to-image arenas show that Lumina-Image 2.0 delivers strong performances even with only 2.6B parameters, highlighting its scalability and design efficiency. We have released our training details, code, and models at https://github.com/Alpha-VLLM/Lumina-Image-2.0.

Qi Qin, Le Zhuo, Yi Xin, Ruoyi Du, Zhen Li, Bin Fu, Yiting Lu, Jiakang Yuan, Xinyue Li, Dongyang Liu, Xiangyang Zhu, Manyuan Zhang, Will Beddow, Erwann Millon, Victor Perez, Wenhai Wang, Conghui He, Bo Zhang, Xiaohong Liu, Hongsheng Li, Yu Qiao, Chang Xu, Peng Gao• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
GenEval Score73
277
Text-to-Image GenerationDPG-Bench
Overall Score87.2
173
Text-to-Image GenerationGenEval (test)
Two Obj. Acc87
169
Text-to-Image GenerationT2I-CompBench
Shape Fidelity60.28
94
Spatial Reasoning GenerationOneIG-EN (test)
Alignment Score81.9
26
Text-to-Image GenerationOneIG-ZH
Alignment73.1
24
Text-to-Image GenerationDPG (test)
Entity Fidelity91.97
16
Text-to-Image GenerationArtificial Analysis Text-to-Image Arena as of February 23, 2025
Overall Score982
6
Text-to-Image GenerationAGI-Eval text-to-image arena 6
ELO Score0.4545
6
Text-to-Image GenerationRapidata test-to-image arena as of February 23, 2025
Overall Score969
6
Showing 10 of 10 rows

Other info

Code

Follow for update