Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Machine Unlearning on Pre-trained Models by Residual Feature Alignment Using LoRA

About

Machine unlearning is an emerging technology that removes a subset of the training data from a trained model without significantly affecting the model performance on the remaining data. This topic is becoming increasingly important in protecting user privacy and eliminating harmful or outdated data. The key challenge lies in effectively and efficiently unlearning specific information without compromising the model's utility on the retained data. For pre-trained models, fine-tuning is an important way to achieve the unlearning target. Previous work typically fine-tuned the entire model's parameters, which incurred significant computational costs. In addition, the fine-tuning process may cause shifts in the intermediate layer features, affecting the model's overall utility. In this work, we propose a novel and efficient machine unlearning method for pre-trained models. We term the method Residual Feature Alignment Unlearning. Specifically, we leverage LoRA (Low-Rank Adaptation) to decompose the model's intermediate features into pre-trained features and residual features. By adjusting the residual features, we align the unlearned model with the pre-trained model at the intermediate feature level to achieve both unlearning and remaining targets. The method aims to learn zero residuals on the retained set and shifted residuals on the unlearning set. Extensive experiments on numerous datasets validate the effectiveness of our approach.

Laiqiao Qin, Tianqing Zhu, Linlin Wang, Wanlei Zhou• 2024

Related benchmarks

TaskDatasetResultRank
Class UnlearningCIFAR-10
MIA Accuracy (Mean)0.00e+0
32
Image ClassificationFashion-MNIST Sample Unlearning (Df)
Accuracy99.22
28
Class UnlearningCIFAR-10
Training Time (s)79.05
24
Sample UnlearningCIFAR-10
Training Time (seconds)87.42
24
Image Classification (Class Unlearning)Fashion MNIST
Activation Distance (Dr)1.36
16
Image Classification (Sample Unlearning)Fashion MNIST
Activation Distance (Dr)0.00e+0
16
Image ClassificationCIFAR-10 Class Unlearning (Drt)
Accuracy (Drt)98.72
14
Class UnlearningFashion MNIST
MIA Rate0.00e+0
14
Image ClassificationCIFAR-10 Sample Unlearning (Dt)
Accuracy98.68
14
Image ClassificationFashion-MNIST Sample Unlearning (Dr)
Accuracy100
14
Showing 10 of 41 rows

Other info

Follow for update