Inductive Moment Matching
About
Diffusion models and Flow Matching generate high-quality samples but are slow at inference, and distilling them into few-step models often leads to instability and extensive tuning. To resolve these trade-offs, we propose Inductive Moment Matching (IMM), a new class of generative models for one- or few-step sampling with a single-stage training procedure. Unlike distillation, IMM does not require pre-training initialization and optimization of two networks; and unlike Consistency Models, IMM guarantees distribution-level convergence and remains stable under various hyperparameters and standard model architectures. IMM surpasses diffusion models on ImageNet-256x256 with 1.99 FID using only 8 inference steps and achieves state-of-the-art 2-step FID of 1.98 on CIFAR-10 for a model trained from scratch.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Class-conditional Image Generation | ImageNet 256x256 (val) | FID1.99 | 293 | |
| Image Generation | ImageNet 256x256 | FID3.99 | 243 | |
| Class-conditional Image Generation | ImageNet 256x256 (train val) | FID3.99 | 178 | |
| Unconditional Image Generation | CIFAR-10 | FID3.2 | 171 | |
| Class-conditional Image Generation | ImageNet 256x256 (test) | FID3.99 | 167 | |
| Unconditional Image Generation | CIFAR-10 unconditional | FID1.98 | 159 | |
| Unconditional Generation | CIFAR-10 (test) | FID3.2 | 102 | |
| Unconditional Image Generation | CIFAR-10 32x32 (test) | FID3.2 | 94 | |
| Class-conditional generation | ImageNet 256 x 256 1k (val) | FID5.33 | 67 | |
| Conditional Image Generation | ImageNet 256px 2012 (val) | FID3.99 | 50 |