DyWeight: Dynamic Gradient Weighting for Few-Step Diffusion Sampling
About
Diffusion Models (DMs) have achieved state-of-the-art generative performance across multiple modalities, yet their sampling process remains prohibitively slow due to the need for hundreds of function evaluations. Recent progress in multi-step ODE solvers has greatly improved efficiency by reusing historical gradients, but existing methods rely on handcrafted coefficients that fail to adapt to the non-stationary dynamics of diffusion sampling. To address this limitation, we propose Dynamic Gradient Weighting (DyWeight), a lightweight, learning-based multi-step solver that introduces a streamlined implicit coupling paradigm. By relaxing classical numerical constraints, DyWeight learns unconstrained time-varying parameters that adaptively aggregate historical gradients while intrinsically scaling the effective step size. This implicit time calibration accurately aligns the solver's numerical trajectory with the model's internal denoising dynamics under large integration steps, avoiding complex decoupled parameterizations and optimizations. Extensive experiments on CIFAR-10, FFHQ, AFHQv2, ImageNet64, LSUN-Bedroom, Stable Diffusion and FLUX.1-dev demonstrate that DyWeight achieves superior visual fidelity and stability with significantly fewer function evaluations, establishing a new state-of-the-art among efficient diffusion solvers. Code is available at https://github.com/Westlake-AGI-Lab/DyWeight
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Unconditional Image Generation | CIFAR-10 32x32 (test) | FID2.13 | 137 | |
| Conditional Image Generation | ImageNet 64x64 (val) | FID3.82 | 87 | |
| Unconditional Image Generation | LSUN Bedroom 256x256 | FID3.45 | 68 | |
| Unconditional Image Generation | FFHQ 64x64 (test) | FID2.77 | 53 | |
| Conditional Image Generation | ImageNet 64x64 | FID3.82 | 47 | |
| Unconditional Image Generation | FFHQ 64 x 64 | FID2.77 | 43 | |
| Unconditional Generation | LSUN Bedroom 256x256 (test) | FID3.45 | 42 | |
| Text-to-Image Generation | MS-COCO 30k (val) | -- | 42 | |
| Unconditional Image Generation | AFHQ 64x64 v2 (test) | FID2.13 | 37 | |
| Conditional Generation | MS-COCO 512x512 (val) | FID11.54 | 24 |