TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
About
The Diffusion model, a prevalent framework for image generation, encounters significant challenges in terms of broad applicability due to its extended inference times and substantial memory requirements. Efficient Post-training Quantization (PTQ) is pivotal for addressing these issues in traditional models. Different from traditional models, diffusion models heavily depend on the time-step $t$ to achieve satisfactory multi-round denoising. Usually, $t$ from the finite set $\{1, \ldots, T\}$ is encoded to a temporal feature by a few modules totally irrespective of the sampling data. However, existing PTQ methods do not optimize these modules separately. They adopt inappropriate reconstruction targets and complex calibration methods, resulting in a severe disturbance of the temporal feature and denoising trajectory, as well as a low compression efficiency. To solve these, we propose a Temporal Feature Maintenance Quantization (TFMQ) framework building upon a Temporal Information Block which is just related to the time-step $t$ and unrelated to the sampling data. Powered by the pioneering block design, we devise temporal information aware reconstruction (TIAR) and finite set calibration (FSC) to align the full-precision temporal features in a limited time. Equipped with the framework, we can maintain the most temporal information and ensure the end-to-end generation quality. Extensive experiments on various datasets and diffusion models prove our state-of-the-art results. Remarkably, our quantization approach, for the first time, achieves model performance nearly on par with the full-precision model under 4-bit weight quantization. Additionally, our method incurs almost no extra computational cost and accelerates quantization time by $2.0 \times$ on LSUN-Bedrooms $256 \times 256$ compared to previous works. Our code is publicly available at https://github.com/ModelTC/TFMQ-DM.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Generation | ImageNet 256x256 (val) | FID22.33 | 307 | |
| Image Generation | LSUN Bedroom 256x256 (test) | FID25.74 | 73 | |
| Unconditional Image Generation | FFHQ 256x256 | FID9.46 | 64 | |
| Class-conditional Image Generation | ImageNet-1K 256x256 (test) | FID10.29 | 50 | |
| Unconditional Image Generation | CIFAR-10 32 x 32 | FID4.24 | 47 | |
| Unconditional Image Generation | LSUN Bedroom 256x256 | FID3.14 | 21 | |
| Unconditional Image Generation | LSUN Churches 256 x 256 (test) | FID5.51 | 18 | |
| Unconditional Image Generation | CelebA-HQ 256 x 256 | FID8.68 | 17 | |
| Unconditional Image Generation | LSUN Churches 256 x 256 | FID4.01 | 13 | |
| Text-guided Image Generation | MS-COCO (test) | FID13.09 | 7 |