Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Three Creates All: You Only Sample 3 Steps

About

Diffusion models deliver high-fidelity generation but remain slow at inference time due to many sequential network evaluations. We find that standard timestep conditioning becomes a key bottleneck for few-step sampling. Motivated by layer-dependent denoising dynamics, we propose Multi-layer Time Embedding Optimization (MTEO), which freeze the pretrained diffusion backbone and distill a small set of step-wise, layer-wise time embeddings from reference trajectories. MTEO is plug-and-play with existing ODE solvers, adds no inference-time overhead, and trains only a tiny fraction of parameters. Extensive experiments across diverse datasets and backbones show state-of-the-art performance in the few-step sampling and substantially narrow the gap between distillation-based and lightweight methods. Code will be available.

Yuren Cai, Guangyi Wang, Zongqing Li, Li Li, Zhihui Liu, Songzhi Su• 2026

Related benchmarks

TaskDatasetResultRank
Image GenerationImageNet 256x256
IS211
359
Image GenerationCIFAR-10
FID2.5
203
Text-to-Image GenerationMS-COCO (val)
FID12.92
202
Image GenerationLSUN bedroom
FID4.44
105
Image GenerationImageNet 64
FID3.81
100
Image GenerationFFHQ
FID2.99
70
Image GenerationCIFAR-10
FID2.5
16
Showing 7 of 7 rows

Other info

Follow for update