Teleportation-Based Defenses for Privacy in Approximate Machine Unlearning
About
Approximate machine unlearning aims to efficiently remove the influence of specific data points from a trained model, offering a practical alternative to full retraining. However, it introduces privacy risks: an adversary with access to pre- and post-unlearning models can exploit their differences for membership inference or data reconstruction. We show these vulnerabilities arise from two factors: large gradient norms of forget-set samples and the close proximity of unlearned parameters to the original model. To demonstrate their severity, we propose unlearning-specific membership inference and reconstruction attacks, showing that several state-of-the-art methods (e.g., NGP, SCRUB) remain vulnerable. To mitigate this leakage, we introduce WARP, a plug-and-play teleportation defense that leverages neural network symmetries to reduce forget-set gradient energy and increase parameter dispersion while preserving predictions. This reparameterization obfuscates the signal of forgotten data, making it harder for attackers to distinguish forgotten samples from non-members or recover them via reconstruction. Across six unlearning algorithms, our approach achieves consistent privacy gains, reducing adversarial advantage (AUC) by up to 64% in black-box and 92% in white-box settings, while maintaining accuracy on retained data. These results highlight teleportation as a general tool for reducing attack success in approximate unlearning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-10 (test) | Accuracy79.7 | 3381 | |
| Membership Inference Attack | CIFAR-10 (Forget) | AUC66.1 | 12 | |
| Black-box Membership Inference Attack | CIFAR-10 Most-memorized 1% forget samples | AUC0.875 | 12 | |
| Membership Inference Attack | CIFAR-10 (all forget samples) | AUC0.516 | 5 | |
| Membership Inference Attack | CIFAR-10 most-memorized (forget top 5%) | AUC59.8 | 5 | |
| Reconstruction Attack | ImageNet-1K (100 forgotten samples) | PSNR (dB)10.74 | 2 | |
| Data Reconstruction Attack | ImageNet-1K | PSNR (dB)7.38 | 2 | |
| White-box Membership Inference Attack | Tiny-ImageNet (forget-set) | AUC0.755 | 2 |