Targeted Forgetting of Image Subgroups in CLIP Models
About
Foundation models (FMs) such as CLIP have demonstrated impressive zero-shot performance across various tasks by leveraging large-scale, unsupervised pre-training. However, they often inherit harmful or unwanted knowledge from noisy internet-sourced datasets, compromising their reliability in real-world applications. Existing model unlearning methods either rely on access to pre-trained datasets or focus on coarse-grained unlearning (e.g., entire classes), leaving a critical gap for fine-grained unlearning. In this paper, we address the challenging scenario of selectively forgetting specific portions of knowledge within a class, without access to pre-trained data, while preserving the model's overall performance. We propose a novel three-stage approach that progressively unlearns targeted knowledge while mitigating over-forgetting. It consists of (1) a forgetting stage to fine-tune the CLIP on samples to be forgotten, (2) a reminding stage to restore performance on retained samples, and (3) a restoring stage to recover zero-shot capabilities using model souping. Additionally, we introduce knowledge distillation to handle the distribution disparity between forgetting, retaining samples, and unseen pre-trained data. Extensive experiments on CIFAR-10, ImageNet-1K, and style datasets demonstrate that our approach effectively unlearns specific subgroups while maintaining strong zero-shot performance on semantically similar subgroups and other categories, significantly outperforming baseline unlearning methods, which lose effectiveness under the CLIP unlearning setting.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ObjectNet | Accuracy24.49 | 251 | |
| Image Classification | Food | Accuracy69.1 | 152 | |
| Image Classification | STL | Top-1 Acc91.53 | 89 | |
| Continual Unlearning | ImageNet-1K | Retention Score50.17 | 60 | |
| Single-class Unlearning | CIFAR-10 | Retain Accuracy73.96 | 42 | |
| Machine Unlearning | ImageNet | Utility Preservation46.87 | 33 | |
| Zero-shot Image Classification | CIFAR-10 | Zero-shot Accuracy89.86 | 18 | |
| Zero-shot Image Classification | Food | Zero-shot Accuracy79 | 18 | |
| Image Classification | ImageNet | Target Accuracy66.27 | 18 | |
| Machine Unlearning | STL | Accuracy90.38 | 18 |