Stable Forgetting: Bounded Parameter-Efficient Unlearning in Foundation Models

About

Machine unlearning in foundation models (e.g., language and vision transformers) is essential for privacy and safety; however, existing approaches are unstable and unreliable. A widely used strategy, the gradient difference method, applies gradient descent to retained data while performing gradient ascent on forgotten data. When combined with cross-entropy, this procedure can trigger the unbounded growth of weights and gradients, degrading both forgetting and retention. We provide a theoretical framework that explains this failure by showing how ascent destabilizes optimization in transformer feedforward MLP layers. Guided by this insight, we propose *Bounded Parameter-Efficient Unlearning*, which stabilizes LoRA-based fine-tuning by applying bounded functions to MLP adapters. This controls the weight dynamics during ascent and enables reliable convergence. We validate the approach on Vision Transformer class deletion on CIFAR-100, where GD+Sine is the only evaluated method to achieve both high forget quality and model utility across ViT-B/16, ViT-L/14, and DeiT-S architectures, and demonstrate generality on language-model benchmarks (TOFU, TDEC, MUSE) across architectures from 22M to 8B parameters, achieving improved forgetting while preserving utility.

Arpit Garg, Hemanth Saratchandran, Ravi Garg, Simon Lucey• 2025

Related benchmarks

Task	Dataset	Result
Language Model Unlearning	TOFU Forget10	Forget Quality (FQ)94.3	54
Machine Unlearning	TOFU 1.0 (Forget10)	Model Utility (MU)52	53
Machine Unlearning	TOFU 1.0 (forget01)	--	53
Machine Unlearning	TOFU Forget01 (1% authors)	Forget Quality (Rouge-L)0.4	48
Machine Unlearning	TOFU Forget10 (10% authors split)	Forget Quality - Rouge-L0.32	42
Machine Unlearning	TOFU Forget05 (5% authors)	Forget Quality (ROUGE-L)0.32	42
Machine Unlearning	CIFAR-10 Random Forget 10% (train)	Retain Accuracy99.98	37
Privacy-preserving unlearning	TDEC	EL10 (%)0.8	37
Machine Unlearning	TOFU Forget05 Phi-1.5B model (5%)	Model Utility (MU)0.52	32
Machine Unlearning	TOFU Forget10 Phi-1.5B model	Forget Quality (FQ)0.585	24

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord