Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models
About
The recent proliferation of large-scale text-to-image models has led to growing concerns that such models may be misused to generate harmful, misleading, and inappropriate content. Motivated by this issue, we derive a technique inspired by continual learning to selectively forget concepts in pretrained deep generative models. Our method, dubbed Selective Amnesia, enables controllable forgetting where a user can specify how a concept should be forgotten. Selective Amnesia can be applied to conditional variational likelihood models, which encompass a variety of popular deep generative frameworks, including variational autoencoders and large-scale text-to-image diffusion models. Experiments across different models demonstrate that our approach induces forgetting on a variety of concepts, from entire classes in standard datasets to celebrity and nudity prompts in text-to-image models. Our code is publicly available at https://github.com/clear-nus/selective-amnesia.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Concept Unlearning | UnlearnDiffAtk | UnlearnDiffAtk0.268 | 36 | |
| Explicit Content Removal | I2P | Armpits Count72 | 28 | |
| Concept Unlearning | Ring-a-Bell | Ring-A-Bell Score32.9 | 20 | |
| Safe Text-to-Image Generation | MMA-Diffusion | Automatic Safety Rate20.5 | 20 | |
| Text-to-Image Generation | Non-targeted concepts | CLIP Score30.6 | 18 | |
| Concept Unlearning | I2P | I2P0.062 | 17 | |
| Concept Unlearning | MMA-Diffusion | MMA-Diffusion20.5 | 16 | |
| Concept Unlearning | P4D | P4D0.623 | 14 | |
| Safe Text-to-Image Generation | I2P | ASR0.062 | 13 | |
| Nudity Concept Erasure | MMA Adversarial Prompts | Erase Rate (%)89.6 | 13 |