Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Stable Forgetting: Bounded Parameter-Efficient Unlearning in Foundation Models

About

Machine unlearning in foundation models (e.g., language and vision transformers) is essential for privacy and safety; however, existing approaches are unstable and unreliable. A widely used strategy, the gradient difference method, applies gradient descent to retained data while performing gradient ascent on forgotten data. When combined with cross-entropy, this procedure can trigger the unbounded growth of weights and gradients, degrading both forgetting and retention. We provide a theoretical framework that explains this failure by showing how ascent destabilizes optimization in transformer feedforward MLP layers. Guided by this insight, we propose *Bounded Parameter-Efficient Unlearning*, which stabilizes LoRA-based fine-tuning by applying bounded functions to MLP adapters. This controls the weight dynamics during ascent and enables reliable convergence. We validate the approach on Vision Transformer class deletion on CIFAR-100, where GD+Sine is the only evaluated method to achieve both high forget quality and model utility across ViT-B/16, ViT-L/14, and DeiT-S architectures, and demonstrate generality on language-model benchmarks (TOFU, TDEC, MUSE) across architectures from 22M to 8B parameters, achieving improved forgetting while preserving utility.

Arpit Garg, Hemanth Saratchandran, Ravi Garg, Simon Lucey• 2025

Related benchmarks

TaskDatasetResultRank
Machine UnlearningTOFU Forget01 (1% authors)
Forget Quality (Rouge-L)0.4
48
Machine UnlearningTOFU Forget10 (10% authors split)
Forget Quality - Rouge-L0.32
42
Machine UnlearningTOFU Forget05 (5% authors)
Forget Quality (ROUGE-L)0.32
42
Privacy-preserving unlearningTDEC
EL10 (%)0.8
37
Machine UnlearningTOFU 1.0 (forget01)
MU Score52
33
Machine UnlearningTOFU Forget10 Phi-1.5B model
Forget Quality (FQ)0.585
24
Machine UnlearningTOFU Forget01 Phi-1.5B model (1%)
Forget Quality (Rouge-L)38
24
Machine UnlearningTOFU Forget05 Phi-1.5B model (5%)
Forget Quality (Rouge-L)0.33
20
Machine UnlearningCIFAR-100 Random Forget (10%)
RA99.93
19
Machine UnlearningCIFAR-10 Random Forget 10% (train)
Retain Accuracy99.98
17
Showing 10 of 20 rows

Other info

Follow for update