HunyuanImage 3.0 Technical Report

About

We present HunyuanImage 3.0, a native multimodal model that unifies multimodal understanding and generation within an autoregressive framework, with its image generation module publicly available. The achievement of HunyuanImage 3.0 relies on several key components, including meticulous data curation, advanced architecture design, a native Chain-of-Thoughts schema, progressive model pre-training, aggressive model post-training, and an efficient infrastructure that enables large-scale training and inference. With these advancements, we successfully trained a Mixture-of-Experts (MoE) model comprising over 80 billion parameters in total, with 13 billion parameters activated per token during inference, making it the largest and most powerful open-source image generative model to date. We conducted extensive experiments and the results of automatic and human evaluation of text-image alignment and visual quality demonstrate that HunyuanImage 3.0 rivals previous state-of-the-art models. By releasing the code and weights of HunyuanImage 3.0, we aim to enable the community to explore new ideas with a state-of-the-art foundation model, fostering a dynamic and vibrant multimodal ecosystem. All open source assets are publicly available at https://github.com/Tencent-Hunyuan/HunyuanImage-3.0

Tencent Hunyuan Foundation Model Team• 2025

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	GenEval	Overall Score72	914
Image Generation	ImageNet 256x256	IS73.84	606
Text-to-Image Generation	GenEval	Overall Score72	581
Text-to-Image Generation	GenEval	GenEval Score72	459
Text-to-Image Generation	DPG	Overall Score86.1	270
Image Reconstruction	ImageNet-1k 256 x 256 (val)	--	144
Text-to-Image Generation	DPGBench	DPGBench Score86.1	133
Image Generation	GenEval	Overall GenEval Score63	132
World Knowledge Image Generation	WISE	Overall Score58	110
Text-to-Image Generation	GenEval	Overall Score72	96

Showing 10 of 57 rows

Other info

Follow for update

@wizwand_team Discord