Mean Flows for One-step Generative Modeling
About
We propose a principled and effective framework for one-step generative modeling. We introduce the notion of average velocity to characterize flow fields, in contrast to instantaneous velocity modeled by Flow Matching methods. A well-defined identity between average and instantaneous velocities is derived and used to guide neural network training. Our method, termed the MeanFlow model, is self-contained and requires no pre-training, distillation, or curriculum learning. MeanFlow demonstrates strong empirical performance: it achieves an FID of 3.43 with a single function evaluation (1-NFE) on ImageNet 256x256 trained from scratch, significantly outperforming previous state-of-the-art one-step diffusion/flow models. Our study substantially narrows the gap between one-step diffusion/flow models and their multi-step predecessors, and we hope it will motivate future research to revisit the foundations of these powerful models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Class-conditional Image Generation | ImageNet 256x256 | -- | 441 | |
| Class-conditional Image Generation | ImageNet 256x256 (val) | FID2.2 | 293 | |
| Image Generation | ImageNet 256x256 | FID2.2 | 243 | |
| Class-conditional Image Generation | ImageNet 256x256 (train val) | FID2.93 | 178 | |
| Unconditional Image Generation | CIFAR-10 | FID2.92 | 171 | |
| Class-conditional Image Generation | ImageNet 256x256 (test) | FID1.74 | 167 | |
| Unconditional Generation | CIFAR-10 (test) | FID2.92 | 102 | |
| Unconditional Image Generation | CIFAR-10 32x32 (test) | FID2.92 | 94 | |
| Class-conditional generation | ImageNet 256 x 256 1k (val) | FID2.93 | 67 | |
| Conditional Image Generation | ImageNet 256px 2012 (val) | FID2.2 | 50 |