The Unseen Threat: Residual Knowledge in Machine Unlearning under Perturbed Samples

About

Machine unlearning offers a practical alternative to avoid full model re-training by approximately removing the influence of specific user data. While existing methods certify unlearning via statistical indistinguishability from re-trained models, these guarantees do not naturally extend to model outputs when inputs are adversarially perturbed. In particular, slight perturbations of forget samples may still be correctly recognized by the unlearned model - even when a re-trained model fails to do so - revealing a novel privacy risk: information about the forget samples may persist in their local neighborhood. In this work, we formalize this vulnerability as residual knowledge and show that it is inevitable in high-dimensional settings. To mitigate this risk, we propose a fine-tuning strategy, named RURK, that penalizes the model's ability to re-recognize perturbed forget samples. Experiments on vision benchmarks with deep neural networks demonstrate that residual knowledge is prevalent across existing unlearning methods and that our approach effectively prevents residual knowledge.

Hsiang Hsu, Pradeep Niroula, Zichang He, Ivan Brugere, Freddy Lecue, Chun-Fu Chen• 2026

Related benchmarks

Task	Dataset	Result
Class Unlearning	CIFAR-10	Retain Accuracy99.8	60
Class Unlearning	Small CIFAR-5	Retention Accuracy98.58	13
Image Classification	CIFAR-10 (test)	Retain Accuracy99.55	11
Image Classification	Small CIFAR-5 (test)	Retention Accuracy (%)99.52	6

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord