Particle Guidance: non-I.I.D. Diverse Sampling with Diffusion Models
About
In light of the widespread success of generative models, a significant amount of research has gone into speeding up their sampling time. However, generative models are often sampled multiple times to obtain a diverse set incurring a cost that is orthogonal to sampling time. We tackle the question of how to improve diversity and sample efficiency by moving beyond the common assumption of independent samples. We propose particle guidance, an extension of diffusion-based generative sampling where a joint-particle time-evolving potential enforces diversity. We analyze theoretically the joint distribution that particle guidance generates, how to learn a potential that achieves optimal diversity, and the connections with methods in other disciplines. Empirically, we test the framework both in the setting of conditional image generation, where we are able to increase diversity without affecting quality, and molecular conformer generation, where we reduce the state-of-the-art median error by 13% on average.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Image Generation | MS-COCO 2017 (val) | FID30.131 | 131 | |
| Text-to-Image Generation | COCO 2014 (val) | -- | 34 | |
| Text-to-Image Generation | COCO truck concept 'a photo of a truck' prompt (test) | BRISQUE34.18 | 24 | |
| Face generation with Gender alignment | FFHQ | Total Variation (TV)0.043 | 20 | |
| Text-to-Image Generation | COCO prompts | Vendi1.787 | 18 | |
| Class-conditional Image Generation | truck concept | BRISQUE40.11 | 18 | |
| Face generation with Age alignment | FFHQ | Total Variation (TV)0.15 | 15 | |
| Face generation with Race alignment | FFHQ | TV0.27 | 15 | |
| Text-to-Image Generation | bus concept | BRISQUE34.76 | 15 | |
| Text-to-Image Generation | bicycle concept (test) | BRISQUE49.33 | 15 |