Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer

About

Recent advancements in image generative foundation models have prioritized quality improvements but often at the cost of increased computational complexity and inference latency. To address this critical trade-off, we introduce HiDream-I1, a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds. HiDream-I1 is constructed with a new sparse Diffusion Transformer (DiT) structure. Specifically, it starts with a dual-stream decoupled design of sparse DiT with dynamic Mixture-of-Experts (MoE) architecture, in which two separate encoders are first involved to independently process image and text tokens. Then, a single-stream sparse DiT structure with dynamic MoE architecture is adopted to trigger multi-model interaction for image generation in a cost-efficient manner. To support flexiable accessibility with varied model capabilities, we provide HiDream-I1 in three variants: HiDream-I1-Full, HiDream-I1-Dev, and HiDream-I1-Fast. Furthermore, we go beyond the typical text-to-image generation and remould HiDream-I1 with additional image conditions to perform precise, instruction-based editing on given images, yielding a new instruction-based image editing model namely HiDream-E1. Ultimately, by integrating text-to-image generation and instruction-based image editing, HiDream-I1 evolves to form a comprehensive image agent (HiDream-A1) capable of fully interactive image creation and refinement. To accelerate multi-modal AIGC research, we have open-sourced all the codes and model weights of HiDream-I1-Full, HiDream-I1-Dev, HiDream-I1-Fast, HiDream-E1 through our project websites: https://github.com/HiDream-ai/HiDream-I1 and https://github.com/HiDream-ai/HiDream-E1. All features can be directly experienced via https://vivago.ai/studio.

Qi Cai, Jingwen Chen, Yang Chen, Yehao Li, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Yiheng Zhang, Fengbin Gao, Peihan Xu, Yimeng Wang, Kai Yu, Wenxuan Chen, Ziwei Feng, Zijian Gong, Jianzhuang Pan, Yi Peng, Rui Tian, Siyu Wang, Bo Zhao, Ting Yao, Tao Mei• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
Overall Score83
467
Text-to-Image GenerationDPG-Bench
Overall Score85.89
173
Text-to-Image GenerationGenEval (test)
Two Obj. Acc98
169
Text-to-Image GenerationDPG
Overall Score85.89
131
Text-to-Image GenerationGenEval
Overall Score83
68
Spatial Reasoning GenerationOneIG-EN (test)
Alignment Score82.9
26
Text-to-Image GenerationOneIG-ZH
Alignment62
24
Geometric diagram generationGenExam-Math (test)
Structural Correctness0.00e+0
20
Text-to-Image GenerationDPG (test)
Entity Fidelity90.22
16
Spatial Reasoning GenerationT2I-CoReBench (test)
MI65.2
16
Showing 10 of 20 rows

Other info

Follow for update