Duality Models: An Embarrassingly Simple One-step Generation Paradigm

About

Consistency-based generative models like Shortcut and MeanFlow achieve impressive results via a target-aware design for solving the Probability Flow ODE (PF-ODE). Typically, such methods introduce a target time $r$ alongside the current time $t$ to modulate outputs between a local multi-step derivative ($r = t$) and a global few-step integral ($r = 0$). However, the conventional "one input, one output" paradigm enforces a partition of the training budget, often allocating a significant portion (e.g., 75% in MeanFlow) solely to the multi-step objective for stability. This separation forces a trade-off: allocating sufficient samples to the multi-step objective leaves the few-step generation undertrained, which harms convergence and limits scalability. To this end, we propose Duality Models (DuMo) via a "one input, dual output" paradigm. Using a shared backbone with dual heads, DuMo simultaneously predicts velocity $v_t$ and flow-map $u_t$ from a single input $x_t$. This applies geometric constraints from the multi-step objective to every sample, bounding the few-step estimation without separating training objectives, thereby significantly improving stability and efficiency. On ImageNet 256 $\times$ 256, a 679M Diffusion Transformer with SD-VAE achieves a state-of-the-art (SOTA) FID of 1.79 in just 2 steps. Code is available at: https://github.com/LINs-lab/DuMo

Peng Sun, Xinyi Shang, Tao Lin, Zhiqiang Shen• 2026

Related benchmarks

Task	Dataset	Result
Unconditional Image Generation	CIFAR-10 32x32 (test)	FID2.86	137
Class-conditional generation	ImageNet 256 x 256 1k (val)	FID1.79	104
Class-conditional Image Generation	ImageNet-1K 512x512 (val)	FID2.23	33

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord