Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Targeted Forgetting of Image Subgroups in CLIP Models

About

Foundation models (FMs) such as CLIP have demonstrated impressive zero-shot performance across various tasks by leveraging large-scale, unsupervised pre-training. However, they often inherit harmful or unwanted knowledge from noisy internet-sourced datasets, compromising their reliability in real-world applications. Existing model unlearning methods either rely on access to pre-trained datasets or focus on coarse-grained unlearning (e.g., entire classes), leaving a critical gap for fine-grained unlearning. In this paper, we address the challenging scenario of selectively forgetting specific portions of knowledge within a class, without access to pre-trained data, while preserving the model's overall performance. We propose a novel three-stage approach that progressively unlearns targeted knowledge while mitigating over-forgetting. It consists of (1) a forgetting stage to fine-tune the CLIP on samples to be forgotten, (2) a reminding stage to restore performance on retained samples, and (3) a restoring stage to recover zero-shot capabilities using model souping. Additionally, we introduce knowledge distillation to handle the distribution disparity between forgetting, retaining samples, and unseen pre-trained data. Extensive experiments on CIFAR-10, ImageNet-1K, and style datasets demonstrate that our approach effectively unlearns specific subgroups while maintaining strong zero-shot performance on semantically similar subgroups and other categories, significantly outperforming baseline unlearning methods, which lose effectiveness under the CLIP unlearning setting.

Zeliang Zhang, Gaowen Liu, Charles Fleming, Ramana Rao Kompella, Chenliang Xu• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationObjectNet
Accuracy24.49
251
Image ClassificationFood
Accuracy69.1
152
Image ClassificationSTL
Top-1 Acc91.53
89
Continual UnlearningImageNet-1K
Retention Score50.17
60
Single-class UnlearningCIFAR-10
Retain Accuracy73.96
42
Machine UnlearningImageNet
Utility Preservation46.87
33
Zero-shot Image ClassificationCIFAR-10
Zero-shot Accuracy89.86
18
Zero-shot Image ClassificationFood
Zero-shot Accuracy79
18
Image ClassificationImageNet
Target Accuracy66.27
18
Machine UnlearningSTL
Accuracy90.38
18
Showing 10 of 20 rows

Other info

Follow for update