Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PixelFlow: Pixel-Space Generative Models with Flow

About

We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent-space models. This approach simplifies the image generation process by eliminating the need for a pre-trained Variational Autoencoder (VAE) and enabling the whole model end-to-end trainable. Through efficient cascade flow modeling, PixelFlow achieves affordable computation cost in pixel space. It achieves an FID of 1.98 on 256$\times$256 ImageNet class-conditional image generation benchmark. The qualitative text-to-image results demonstrate that PixelFlow excels in image quality, artistry, and semantic control. We hope this new paradigm will inspire and open up new opportunities for next-generation visual generation models. Code and models are available at https://github.com/ShoufaChen/PixelFlow.

Shoufa Chen, Chongjian Ge, Shilong Zhang, Peize Sun, Ping Luo• 2025

Related benchmarks

TaskDatasetResultRank
Class-conditional Image GenerationImageNet 256x256
Inception Score (IS)282.1
815
Class-conditional Image GenerationImageNet 256x256 (val)
FID1.98
427
Text-to-Image GenerationGenEval
Overall Score60
391
Image GenerationImageNet 256x256
IS282.1
359
Class-conditional Image GenerationImageNet 256x256 (train)
IS282.1
345
Image GenerationImageNet 256x256 (val)
FID1.98
340
Text-to-Image GenerationGenEval (test)--
221
Class-conditional Image GenerationImageNet 256x256 (train val)
FID1.98
178
Image GenerationImageNet 256x256 (train)
FID1.98
164
Class-conditional Image GenerationImageNet-1K 256x256 1.0 (train)
gFID1.98
35
Showing 10 of 12 rows

Other info

Code

Follow for update