Stable Forgetting: Bounded Parameter-Efficient Unlearning in Foundation Models
About
Machine unlearning in foundation models (e.g., language and vision transformers) is essential for privacy and safety; however, existing approaches are unstable and unreliable. A widely used strategy, the gradient difference method, applies gradient descent to retained data while performing gradient ascent on forgotten data. When combined with cross-entropy, this procedure can trigger the unbounded growth of weights and gradients, degrading both forgetting and retention. We provide a theoretical framework that explains this failure by showing how ascent destabilizes optimization in transformer feedforward MLP layers. Guided by this insight, we propose *Bounded Parameter-Efficient Unlearning*, which stabilizes LoRA-based fine-tuning by applying bounded functions to MLP adapters. This controls the weight dynamics during ascent and enables reliable convergence. We validate the approach on Vision Transformer class deletion on CIFAR-100, where GD+Sine is the only evaluated method to achieve both high forget quality and model utility across ViT-B/16, ViT-L/14, and DeiT-S architectures, and demonstrate generality on language-model benchmarks (TOFU, TDEC, MUSE) across architectures from 22M to 8B parameters, achieving improved forgetting while preserving utility.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Machine Unlearning | TOFU Forget01 (1% authors) | Forget Quality (Rouge-L)0.4 | 48 | |
| Machine Unlearning | TOFU Forget10 (10% authors split) | Forget Quality - Rouge-L0.32 | 42 | |
| Machine Unlearning | TOFU Forget05 (5% authors) | Forget Quality (ROUGE-L)0.32 | 42 | |
| Privacy-preserving unlearning | TDEC | EL10 (%)0.8 | 37 | |
| Machine Unlearning | TOFU 1.0 (forget01) | MU Score52 | 33 | |
| Machine Unlearning | TOFU Forget10 Phi-1.5B model | Forget Quality (FQ)0.585 | 24 | |
| Machine Unlearning | TOFU Forget01 Phi-1.5B model (1%) | Forget Quality (Rouge-L)38 | 24 | |
| Machine Unlearning | TOFU Forget05 Phi-1.5B model (5%) | Forget Quality (Rouge-L)0.33 | 20 | |
| Machine Unlearning | CIFAR-100 Random Forget (10%) | RA99.93 | 19 | |
| Machine Unlearning | CIFAR-10 Random Forget 10% (train) | Retain Accuracy99.98 | 17 |