QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
About
The practical deployment of diffusion models is still hindered by the high memory and computational overhead. Although quantization paves a way for model compression and acceleration, existing methods face challenges in achieving low-bit quantization efficiently. In this paper, we identify imbalanced activation distributions as a primary source of quantization difficulty, and propose to adjust these distributions through weight finetuning to be more quantization-friendly. We provide both theoretical and empirical evidence supporting finetuning as a practical and reliable solution. Building on this approach, we further distinguish two critical types of quantized layers: those responsible for retaining essential temporal information and those particularly sensitive to bit-width reduction. By selectively finetuning these layers under both local and global supervision, we mitigate performance degradation while enhancing quantization efficiency. Our method demonstrates its efficacy across three high-resolution image generation tasks, obtaining state-of-the-art performance across multiple bit-width settings.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Class-conditional Image Generation | ImageNet 256x256 (val) | FID8.45 | 293 | |
| Image Super-resolution | DRealSR | MANIQA0.3541 | 78 | |
| Image Generation | LSUN Bedroom 256x256 (test) | FID10.1 | 73 | |
| Real-world Image Super-Resolution | RealLR200 | MUSIQ37.21 | 26 | |
| Real-world Image Super-Resolution | RealLQ250 | MUSIQ0.3687 | 26 | |
| Real-world Image Super-Resolution | DRealSR | LPIPS0.8322 | 23 | |
| Real-world Image Super-Resolution | RealSR | LPIPS0.8663 | 23 | |
| Conditional Image Generation | ImageNet 256x256 | FID5.98 | 22 | |
| Conditional Image Generation | ImageNet 256x256 CFG=1.5 1K (val) | IS4.87 | 18 | |
| Unconditional Generation | LSUN Church 256x256 (test) | FID6.83 | 11 |