Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Cross-Resolution Distribution Matching for Diffusion Distillation

About

Diffusion distillation is central to accelerating image and video generation, yet existing methods are fundamentally limited by the denoising process, where step reduction has largely saturated. Partial timestep low-resolution generation can further accelerate inference, but it suffers noticeable quality degradation due to cross-resolution distribution gaps. We propose Cross-Resolution Distribution Matching Distillation (RMD), a novel distillation framework that bridges cross-resolution distribution gaps for high-fidelity, few-step multi-resolution cascaded inference. Specifically, RMD divides the timestep intervals for each resolution using logarithmic signal-to-noise ratio (logSNR) curves, and introduces logSNR-based mapping to compensate for resolution-induced shifts. Distribution matching is conducted along resolution trajectories to reduce the gap between low-resolution generator distributions and the teacher's high-resolution distribution. In addition, a predicted-noise re-injection mechanism is incorporated during upsampling to stabilize training and improve synthesis quality. Quantitative and qualitative results show that RMD preserves high-fidelity generation while accelerating inference across various backbones. Notably, RMD achieves up to 33.4X speedup on SDXL and 25.6X on Wan2.1-14B, while preserving high visual fidelity.

Feiyang Chen, Hongpeng Pan, Haonan Xu, Xinyu Duan, Yang Yang, Zhefeng Wang• 2026

Related benchmarks

TaskDatasetResultRank
Text-to-Video GenerationT2V-CompBench--
63
Text-to-Video GenerationVBench original prompts (test)
Total Score82.51
4
Showing 2 of 2 rows

Other info

Follow for update