Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning

About

Unlearning in large foundation models (e.g., LLMs) is essential for enabling dynamic knowledge updates, enforcing data deletion rights, and correcting model behavior. However, existing unlearning methods often require full-model fine-tuning or access to the original training data, which limits their scalability and practicality. In this work, we introduce Recover-to-Forget (R2F), a novel framework for efficient unlearning in LLMs based on reconstructing full-model gradient directions from low-rank LoRA adapter updates. Rather than performing backpropagation through the full model, we compute gradients with respect to LoRA parameters using multiple paraphrased prompts and train a gradient decoder to approximate the corresponding full-model gradients. To ensure applicability to larger or black-box models, the decoder is trained on a proxy model and transferred to target models. We provide a theoretical analysis of cross-model generalization and demonstrate that our method achieves effective unlearning while preserving general model performance. Experimental results demonstrate that R2F offers a scalable and lightweight alternative for unlearning in pretrained LLMs without requiring full retraining or access to internal parameters.

Yezi Liu, Hanning Chen, Wenjun Huang, Yang Ni, Mohsen Imani• 2025

Related benchmarks

Task	Dataset	Result
LLM Unlearning	RWKU	USR89.3	16
Machine Unlearning	MUSE	--	16
Machine Unlearning	WaterDrum	USR87.4	8
Machine Unlearning	WMDP	MIA0.049	8
Relearning Attack	RWKU	RAP18.3	8
Relearning Attack	WMDP	RAP20.1	8
Relearning Attack	MUSE	RAP22.5	8
Relearning Attack	WaterDrum	RAP19.4	8

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord