PixelFlow: Pixel-Space Generative Models with Flow

About

We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent-space models. This approach simplifies the image generation process by eliminating the need for a pre-trained Variational Autoencoder (VAE) and enabling the whole model end-to-end trainable. Through efficient cascade flow modeling, PixelFlow achieves affordable computation cost in pixel space. It achieves an FID of 1.98 on 256$\times$256 ImageNet class-conditional image generation benchmark. The qualitative text-to-image results demonstrate that PixelFlow excels in image quality, artistry, and semantic control. We hope this new paradigm will inspire and open up new opportunities for next-generation visual generation models. Code and models are available at https://github.com/ShoufaChen/PixelFlow.

Shoufa Chen, Chongjian Ge, Shilong Zhang, Peize Sun, Ping Luo• 2025

Related benchmarks

Task	Dataset	Result
Class-conditional Image Generation	ImageNet 256x256	Inception Score (IS)282.1	967
Text-to-Image Generation	GenEval	Overall Score60	704
Image Generation	ImageNet 256x256	IS282.1	517
Class-conditional Image Generation	ImageNet 256x256 (val)	Inception Score (IS)282.1	493
Image Generation	ImageNet 256x256 (val)	FID1.98	399
Class-conditional Image Generation	ImageNet 256x256 (train)	IS282.1	367
Text-to-Image Generation	GenEval (test)	--	250
Image Generation	ImageNet 256x256 (train)	FID1.98	211
Class-conditional Image Generation	ImageNet 256x256 (train val)	FID1.98	203
Text-to-Image Generation	GenEval	Overall Score (GenEval)0.6	153

Showing 10 of 16 rows

Other info

Code

Follow for update

@wizwand_team Discord