Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning Few-Step Diffusion Models by Trajectory Distribution Matching

About

Accelerating diffusion model sampling is crucial for efficient AIGC deployment. While diffusion distillation methods -- based on distribution matching and trajectory matching -- reduce sampling to as few as one step, they fall short on complex tasks like text-to-image generation. Few-step generation offers a better balance between speed and quality, but existing approaches face a persistent trade-off: distribution matching lacks flexibility for multi-step sampling, while trajectory matching often yields suboptimal image quality. To bridge this gap, we propose learning few-step diffusion models by Trajectory Distribution Matching (TDM), a unified distillation paradigm that combines the strengths of distribution and trajectory matching. Our method introduces a data-free score distillation objective, aligning the student's trajectory with the teacher's at the distribution level. Further, we develop a sampling-steps-aware objective that decouples learning targets across different steps, enabling more adjustable sampling. This approach supports both deterministic sampling for superior image quality and flexible multi-step adaptation, achieving state-of-the-art performance with remarkable efficiency. Our model, TDM, outperforms existing methods on various backbones, such as SDXL and PixArt-$\alpha$, delivering superior quality and significantly reduced training costs. In particular, our method distills PixArt-$\alpha$ into a 4-step generator that outperforms its teacher on real user preference at 1024 resolution. This is accomplished with 500 iterations and 2 A800 hours -- a mere 0.01% of the teacher's training cost. In addition, our proposed TDM can be extended to accelerate text-to-video diffusion. Notably, TDM can outperform its teacher model (CogVideoX-2B) by using only 4 NFE on VBench, improving the total score from 80.91 to 81.65. Project page: https://tdm-t2x.github.io/

Yihong Luo, Tianyang Hu, Jiacheng Sun, Yujun Cai, Jing Tang• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
Overall Score50
391
Text-to-Image GenerationMS-COCO
FID20.44
131
Text-to-Video GenerationT2V-CompBench--
63
Compositional Image GenerationGenEval
Overall Score0.61
44
Text-to-Image GenerationHPS v2.1
Score (Anime)32.91
30
Text-to-Image GenerationCOCO-10K
CLIP Score0.3224
16
Image GenerationDrawBench
Aesthetic Score5.41
10
Visual Text RenderingVisual Text Rendering
OCR Accuracy55
8
Text-to-Image GenerationShareGPT-4o-Image SD3-Medium
CLIP Score34.0301
7
Text-to-Image GenerationCOCO 1K--
7
Showing 10 of 11 rows

Other info

Follow for update