Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models

About

Machine unlearning for large language models often faces a privacy dilemma in which strict constraints prohibit sharing either the server's parameters or the client's forget set. To address this dual non-disclosure constraint, we propose MPU, an algorithm-agnostic privacy-preserving Multiple Perturbed Copies Unlearning framework that primarily introduces two server-side modules: Pre-Process for randomized copy generation and Post-Process for update aggregation. In Pre-Process, the server distributes multiple perturbed and reparameterized model instances, allowing the client to execute unlearning locally on its private forget set without accessing the server's exact original parameters. After local unlearning, the server performs Post-Process by inverting the reparameterization and aggregating updates with a harmonic denoising procedure to alleviate the impact of perturbation. Experiments with seven unlearning algorithms show that MPU achieves comparable unlearning performance to noise-free baselines, with most algorithms' average degradation well below 1% up to 10% noise, and can even outperform the noise-free baseline for some algorithms under 1% noise. Code is available at https://github.com/Tristan0318/MPU.

Tiantong Wang, Xinyu Yan, Tiantong Wu, Yurong Hao, Pengjun Xie, Wei Yang Bryan Lim• 2026

Related benchmarks

TaskDatasetResultRank
Machine UnlearningMUSE Books
Privacy Leakage-89.6
83
Machine UnlearningMUSE NEWS--
34
Machine UnlearningTOFU (Split99)
Forget Quality1
28
UnlearningTOFU (Split99)
Forget Quality1
25
Machine UnlearningTOFU Llama-3.2-1B (Split99)
Forget Quality91.9
21
Machine UnlearningMUSE NEWS
Extraction Strength0.975
12
Showing 6 of 6 rows

Other info

Follow for update