Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning to Unlearn: Instance-wise Unlearning for Pre-trained Classifiers

About

Since the recent advent of regulations for data protection (e.g., the General Data Protection Regulation), there has been increasing demand in deleting information learned from sensitive data in pre-trained models without retraining from scratch. The inherent vulnerability of neural networks towards adversarial attacks and unfairness also calls for a robust method to remove or correct information in an instance-wise fashion, while retaining the predictive performance across remaining data. To this end, we consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model, by either misclassifying each instance away from its original prediction or relabeling the instance to a different label. We also propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information. Both methods only require the pre-trained model and data instances to forget, allowing painless application to real-life settings where the entire training set is unavailable. Through extensive experimentation on various image classification benchmarks, we show that our approach effectively preserves knowledge of remaining data while unlearning given instances in both single-task and continual unlearning scenarios.

Sungmin Cha, Sungjun Cho, Dasol Hwang, Honglak Lee, Taesup Moon, Moontae Lee• 2023

Related benchmarks

TaskDatasetResultRank
Machine UnlearningCIFAR-10
Accf3.28
45
Machine UnlearningTiny-ImageNet (train)
Removal Accuracy (Train)23.9
41
Class UnlearningCIFAR-10
Retain Accuracy89.15
39
Single-class UnlearningCIFAR-100
ACCr71.56
28
Single-class UnlearningMNIST
Accuracy Retention (ACCr)0.9915
28
Single-class UnlearningCIFAR-10
Forget Accuracy12.36
16
Machine UnlearningTiny-ImageNet 200 classes (train test)
Acctr (Residual)92.69
13
Machine UnlearningTiny ImageNet (test)
Residual Accuracy19.1
13
Machine UnlearningRAF-DB (train)
Accuracy Retention99.99
12
Class UnlearningCIFAR-10
U-LiRA Accuracy85.67
12
Showing 10 of 22 rows

Other info

Follow for update