Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Distribution Matching Distillation Meets Reinforcement Learning

About

Distribution Matching Distillation (DMD) facilitates efficient inference by distilling multi-step diffusion models into few-step variants. Concurrently, Reinforcement Learning (RL) has emerged as a vital tool for aligning generative models with human preferences. While both represent critical post-training stages for large-scale diffusion models, existing studies typically treat them as independent, sequential processes, leaving a systematic framework for their unification largely unexplored. In this work, we demonstrate that jointly optimizing these two objectives yields mutual benefits: RL enables more preference-aware and controllable distillation rather than uniformly compressing the full data distribution, while DMD serves as an effective regularizer to mitigate reward hacking during RL training. Building on these insights, we propose DMDR, a unified framework that incorporates Reward-Tilted Distribution Matching optimization alongside two dynamic distillation training strategies in the initial stage, followed by the joint DMD and RL optimization in the second stage. Extensive experiments demonstrate that DMDR achieves state-of-the-art visual quality and prompt adherence among few-step generation methods, even surpassing the performance of its multi-step teacher model.

Dengyang Jiang, Dongyang Liu, Zanyi Wang, Qilong Wu, Liuzhuozheng Li, Hengzhuang Li, Xin Jin, David Liu, Changsheng Lu, Zhen Li, Bo Zhang, Mengmeng Wang, Steven Hoi, Peng Gao, Harry Yang• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
Overall Score56
506
Text-to-Image GenerationShareGPT-4o-Image SDXL-Base
CLIP Score35.6241
9
Text-to-Image GenerationCOCO 10K prompts 2014 (karpathy 30K)
HPS29.5
9
Text-to-Image GenerationShareGPT-4o-Image SD3-Medium
CLIP Score35.0142
7
Text-to-Image Generation (Stable Diffusion 3.5 Medium Comparison)COCO 10K prompts 2014 (Karpathy)
HPS30.83
6
Text-to-Image GenerationShareGPT-4o-Image SD3.5-Large
CLIP Score35.9757
3
Text-to-Image GenerationDrawBench
GenEval Score0.73
3
Class-to-image generationImageNet 50k (val)
FID9.6341
3
Showing 8 of 8 rows

Other info

Follow for update