Universal Guidance for Diffusion Models
About
Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, and classifier signals. Code is available at https://github.com/arpitbansal297/Universal-Guided-Diffusion.
Arpit Bansal, Hong-Min Chu, Avi Schwarzschild, Soumyadip Sengupta, Micah Goldblum, Jonas Geiping, Tom Goldstein• 2023
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Class-conditional Image Generation | ImageNet | FID205 | 158 | |
| Conditional Image Generation | CIFAR-10 | FID94 | 77 | |
| Class-conditional generation | ImageNet | FID42.3 | 14 | |
| Super-Resolution | ImageNet 16x scale | LPIPS0.31 | 14 | |
| Super-Resolution | ImageNet 4x scale | LPIPS0.15 | 14 | |
| Gaussian Deblurring | ImageNet Gaussian Blur sigma=12 | LPIPS0.37 | 14 | |
| Super Resolution 16x | Cats | LPIPS0.27 | 14 | |
| Gaussian Deblur 12 | Cats | LPIPS0.32 | 14 | |
| Inpainting | Cats | LPIPS0.21 | 14 | |
| Super-Resolution (4x) | Cats | LPIPS0.11 | 14 |
Showing 10 of 19 rows