Stable Forgetting: Bounded Parameter-Efficient Unlearning in Foundation Models
About
Machine unlearning in foundation models (e.g., language and vision transformers) is essential for privacy and safety; however, existing approaches are unstable and unreliable. A widely used strategy, the gradient difference method, applies gradient descent to retained data while performing gradient ascent on forgotten data. When combined with cross-entropy, this procedure can trigger the unbounded growth of weights and gradients, degrading both forgetting and retention. We provide a theoretical framework that explains this failure by showing how ascent destabilizes optimization in transformer feedforward MLP layers. Guided by this insight, we propose *Bounded Parameter-Efficient Unlearning*, which stabilizes LoRA-based fine-tuning by applying bounded functions to MLP adapters. This controls the weight dynamics during ascent and enables reliable convergence. We validate the approach on Vision Transformer class deletion on CIFAR-100, where GD+Sine is the only evaluated method to achieve both high forget quality and model utility across ViT-B/16, ViT-L/14, and DeiT-S architectures, and demonstrate generality on language-model benchmarks (TOFU, TDEC, MUSE) across architectures from 22M to 8B parameters, achieving improved forgetting while preserving utility.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Model Unlearning | TOFU Forget10 | Forget Quality (FQ)94.3 | 54 | |
| Machine Unlearning | TOFU 1.0 (Forget10) | Model Utility (MU)52 | 53 | |
| Machine Unlearning | TOFU 1.0 (forget01) | -- | 53 | |
| Machine Unlearning | TOFU Forget01 (1% authors) | Forget Quality (Rouge-L)0.4 | 48 | |
| Machine Unlearning | TOFU Forget10 (10% authors split) | Forget Quality - Rouge-L0.32 | 42 | |
| Machine Unlearning | TOFU Forget05 (5% authors) | Forget Quality (ROUGE-L)0.32 | 42 | |
| Machine Unlearning | CIFAR-10 Random Forget 10% (train) | Retain Accuracy99.98 | 37 | |
| Privacy-preserving unlearning | TDEC | EL10 (%)0.8 | 37 | |
| Machine Unlearning | TOFU Forget05 Phi-1.5B model (5%) | Model Utility (MU)0.52 | 32 | |
| Machine Unlearning | TOFU Forget10 Phi-1.5B model | Forget Quality (FQ)0.585 | 24 |