Distribution Matching Distillation Meets Reinforcement Learning
About
Distribution Matching Distillation (DMD) facilitates efficient inference by distilling multi-step diffusion models into few-step variants. Concurrently, Reinforcement Learning (RL) has emerged as a vital tool for aligning generative models with human preferences. While both represent critical post-training stages for large-scale diffusion models, existing studies typically treat them as independent, sequential processes, leaving a systematic framework for their unification largely unexplored. In this work, we demonstrate that jointly optimizing these two objectives yields mutual benefits: RL enables more preference-aware and controllable distillation rather than uniformly compressing the full data distribution, while DMD serves as an effective regularizer to mitigate reward hacking during RL training. Building on these insights, we propose DMDR, a unified framework that incorporates Reward-Tilted Distribution Matching optimization alongside two dynamic distillation training strategies in the initial stage, followed by the joint DMD and RL optimization in the second stage. Extensive experiments demonstrate that DMDR achieves state-of-the-art visual quality and prompt adherence among few-step generation methods, even surpassing the performance of its multi-step teacher model.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Image Generation | GenEval | Overall Score56 | 506 | |
| Text-to-Image Generation | ShareGPT-4o-Image SDXL-Base | CLIP Score35.6241 | 9 | |
| Text-to-Image Generation | COCO 10K prompts 2014 (karpathy 30K) | HPS29.5 | 9 | |
| Text-to-Image Generation | ShareGPT-4o-Image SD3-Medium | CLIP Score35.0142 | 7 | |
| Text-to-Image Generation (Stable Diffusion 3.5 Medium Comparison) | COCO 10K prompts 2014 (Karpathy) | HPS30.83 | 6 | |
| Text-to-Image Generation | ShareGPT-4o-Image SD3.5-Large | CLIP Score35.9757 | 3 | |
| Text-to-Image Generation | DrawBench | GenEval Score0.73 | 3 | |
| Class-to-image generation | ImageNet 50k (val) | FID9.6341 | 3 |