AMUN: Adversarial Machine UNlearning

About

Machine unlearning, where users can request the deletion of a forget dataset, is becoming increasingly important because of numerous privacy regulations. Initial works on ``exact'' unlearning (e.g., retraining) incur large computational overheads. However, while computationally inexpensive, ``approximate'' methods have fallen short of reaching the effectiveness of exact unlearning: models produced fail to obtain comparable accuracy and prediction confidence on both the forget and test (i.e., unseen) dataset. Exploiting this observation, we propose a new unlearning method, Adversarial Machine UNlearning (AMUN), that outperforms prior state-of-the-art (SOTA) methods for image classification. AMUN lowers the confidence of the model on the forget samples by fine-tuning the model on their corresponding adversarial examples. Adversarial examples naturally belong to the distribution imposed by the model on the input space; fine-tuning the model on the adversarial examples closest to the corresponding forget samples (a) localizes the changes to the decision boundary of the model around each forget sample and (b) avoids drastic changes to the global behavior of the model, thereby preserving the model's accuracy on test samples. Using AMUN for unlearning a random $10\%$ of CIFAR-10 samples, we observe that even SOTA membership inference attacks cannot do better than random guessing.

Ali Ebrahimpour-Boroojeny, Hari Sundaram, Varun Chandrasekaran• 2025

Related benchmarks

Task	Dataset	Result
Machine Unlearning	CIFAR-100 (test)	--	66
Machine Unlearning	SVHN	Retention Accuracy (RA)94.1	9
Machine Unlearning	CIFAR-100 50% deletion for class 68 (road)	Utility Accuracy (UA)92.8	8
Machine Unlearning	CIFAR-100 50% deletion of class 36 hamster (test)	Utility Accuracy (UA)82.8	8
Machine Unlearning	CIFAR-100 apple class, 50% deletion (forget)	Utility Accuracy (UA)88	8
Machine Unlearning	CIFAR-100 50% deletion of oak_tree class 52 (train test retain)	Utility Accuracy (UA)68.2	8
Machine Unlearning	CIFAR-100 90% class deletion	UA32.5	8
Machine Unlearning	CIFAR-100 class 73 (shark) 50% deletion	Utility Accuracy (UA)67.9	8
Machine Unlearning	CIFAR-100 class-fraction-to-forget = 0.5	Unlearning Accuracy (UA)67.4	8
Machine Unlearning	CIFAR-100 apple class 50% deletion (retain)	RA94.3	8

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord