Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Duality Models: An Embarrassingly Simple One-step Generation Paradigm

About

Consistency-based generative models like Shortcut and MeanFlow achieve impressive results via a target-aware design for solving the Probability Flow ODE (PF-ODE). Typically, such methods introduce a target time $r$ alongside the current time $t$ to modulate outputs between a local multi-step derivative ($r = t$) and a global few-step integral ($r = 0$). However, the conventional "one input, one output" paradigm enforces a partition of the training budget, often allocating a significant portion (e.g., 75% in MeanFlow) solely to the multi-step objective for stability. This separation forces a trade-off: allocating sufficient samples to the multi-step objective leaves the few-step generation undertrained, which harms convergence and limits scalability. To this end, we propose Duality Models (DuMo) via a "one input, dual output" paradigm. Using a shared backbone with dual heads, DuMo simultaneously predicts velocity $v_t$ and flow-map $u_t$ from a single input $x_t$. This applies geometric constraints from the multi-step objective to every sample, bounding the few-step estimation without separating training objectives, thereby significantly improving stability and efficiency. On ImageNet 256 $\times$ 256, a 679M Diffusion Transformer with SD-VAE achieves a state-of-the-art (SOTA) FID of 1.79 in just 2 steps. Code is available at: https://github.com/LINs-lab/DuMo

Peng Sun, Xinyi Shang, Tao Lin, Zhiqiang Shen• 2026

Related benchmarks

TaskDatasetResultRank
Unconditional Image GenerationCIFAR-10 32x32 (test)
FID2.86
94
Class-conditional generationImageNet 256 x 256 1k (val)
FID1.79
67
Class-conditional Image GenerationImageNet-1K 512x512 (val)
FID2.23
33
Showing 3 of 3 rows

Other info

Follow for update