Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

About

Few-step diffusion or flow-based generative models typically distill a velocity-predicting teacher into a student that predicts a shortcut towards denoised data. This format mismatch has led to complex distillation procedures that often suffer from a quality-diversity trade-off. To address this, we propose policy-based flow models ($\pi$-Flow). $\pi$-Flow modifies the output layer of a student flow model to predict a network-free policy at one timestep. The policy then produces dynamic flow velocities at future substeps with negligible overhead, enabling fast and accurate ODE integration on these substeps without extra network evaluations. To match the policy's ODE trajectory to the teacher's, we introduce a novel imitation distillation approach, which matches the policy's velocity to the teacher's along the policy's trajectory using a standard $\ell_2$ flow matching loss. By simply mimicking the teacher's behavior, $\pi$-Flow enables stable and scalable training and avoids the quality-diversity trade-off. On ImageNet 256$^2$, it attains a 1-NFE FID of 2.85, outperforming previous 1-NFE models of the same DiT architecture. On FLUX.1-12B and Qwen-Image-20B at 4 NFEs, $\pi$-Flow achieves substantially better diversity than state-of-the-art DMD models, while maintaining teacher-level quality.

Hansheng Chen, Kai Zhang, Hao Tan, Leonidas Guibas, Gordon Wetzstein, Sai Bi• 2025

Related benchmarks

TaskDatasetResultRank
Class-conditional Image GenerationImageNet 256x256 (val)
FID1.97
293
Text-to-Image GenerationGenEval
GenEval Score83
277
Image GenerationImageNet 256x256
FID1.97
243
Text-to-Image GenerationDPG-Bench
DPG Score86.45
89
Class-conditional generationImageNet 256 x 256 1k (val)
FID2.85
67
Text-to-Image GenerationOneIG-Bench
Alignment0.881
33
Text-to-Image GenerationMS-COCO 10K prompts 2014 (val)
FID29
19
Text-to-Image GenerationHPS prompt set v2
CLIP Score0.301
11
Text-to-Image GenerationAlign5000 1.0 (test)
CLIP Score0.323
9
Showing 9 of 9 rows

Other info

Follow for update