Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer

About

Recent advancements in image generative foundation models have prioritized quality improvements but often at the cost of increased computational complexity and inference latency. To address this critical trade-off, we introduce HiDream-I1, a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds. HiDream-I1 is constructed with a new sparse Diffusion Transformer (DiT) structure. Specifically, it starts with a dual-stream decoupled design of sparse DiT with dynamic Mixture-of-Experts (MoE) architecture, in which two separate encoders are first involved to independently process image and text tokens. Then, a single-stream sparse DiT structure with dynamic MoE architecture is adopted to trigger multi-model interaction for image generation in a cost-efficient manner. To support flexiable accessibility with varied model capabilities, we provide HiDream-I1 in three variants: HiDream-I1-Full, HiDream-I1-Dev, and HiDream-I1-Fast. Furthermore, we go beyond the typical text-to-image generation and remould HiDream-I1 with additional image conditions to perform precise, instruction-based editing on given images, yielding a new instruction-based image editing model namely HiDream-E1. Ultimately, by integrating text-to-image generation and instruction-based image editing, HiDream-I1 evolves to form a comprehensive image agent (HiDream-A1) capable of fully interactive image creation and refinement. To accelerate multi-modal AIGC research, we have open-sourced all the codes and model weights of HiDream-I1-Full, HiDream-I1-Dev, HiDream-I1-Fast, HiDream-E1 through our project websites: https://github.com/HiDream-ai/HiDream-I1 and https://github.com/HiDream-ai/HiDream-E1. All features can be directly experienced via https://vivago.ai/studio.

Qi Cai, Jingwen Chen, Yang Chen, Yehao Li, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Yiheng Zhang, Fengbin Gao, Peihan Xu, Yimeng Wang, Kai Yu, Wenxuan Chen, Ziwei Feng, Zijian Gong, Jianzhuang Pan, Yi Peng, Rui Tian, Siyu Wang, Bo Zhao, Ting Yao, Tao Mei• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
Overall Score83
506
Text-to-Image GenerationGenEval
Overall Score83
391
Text-to-Image GenerationDPG-Bench
Overall Score85.89
265
Text-to-Image GenerationGenEval (test)
Two Obj. Acc98
221
Text-to-Image GenerationGenEval
Overall Score83
218
Text-to-Image GenerationDPG
Overall Score85.89
172
Text-to-Image GenerationGenEval
Overall Score83
96
Text-to-Image GenerationDPGBench
Attribute Score89.48
44
Text-to-Image GenerationStructT2IBench (test)
Chart Score9.47
40
Text-to-Image GenerationOneIG-ZH
Alignment62
34
Showing 10 of 35 rows

Other info

Follow for update