Universal Guidance for Diffusion Models
About
Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, and classifier signals. Code is available at https://github.com/arpitbansal297/Universal-Guided-Diffusion.
Arpit Bansal, Hong-Min Chu, Avi Schwarzschild, Soumyadip Sengupta, Micah Goldblum, Jonas Geiping, Tom Goldstein• 2023
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Class-conditional Image Generation | ImageNet | FID205 | 132 | |
| Conditional Image Generation | CIFAR-10 | FID94 | 71 | |
| Conditional Image Generation | Fine-grained Birds | Accuracy1.1 | 8 | |
| Conditional Image Generation | CelebA-HQ Gender+Age | Accuracy75.1 | 7 | |
| Conditional Image Generation | CelebA-HQ Gender+Hair | Accuracy71.3 | 7 | |
| Text-to-Image Generation | HPD v2 | Rew1.0423 | 4 | |
| Text-to-Image Generation | HPD v2 | Rew1.26 | 4 | |
| Stylized Image Generation | SD prompts Stylized results 1.4 | Style Loss18.04 | 4 |
Showing 8 of 8 rows