Representation Unlearning: Forgetting through Information Compression
About
Machine unlearning seeks to remove the influence of specific training data from a model, a need driven by privacy regulations and robustness concerns. Existing approaches typically modify model parameters, but such updates can be unstable, computationally costly, and limited by local approximations. We introduce Representation Unlearning, a framework that performs unlearning directly in the model's representation space. Instead of modifying model parameters, we learn a transformation over representations that imposes an information bottleneck: maximizing mutual information with retained data while suppressing information about data to be forgotten. We derive variational surrogates that make this objective tractable and show how they can be instantiated in two practical regimes: when both retain and forget data are available, and in a zero-shot setting where only forget data can be accessed. Experiments across several benchmarks demonstrate that Representation Unlearning achieves more reliable forgetting, better utility retention, and greater computational efficiency than parameter-centric baselines.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Class Unlearning | CIFAR-10 (test) | Test Accuracy93.5 | 21 | |
| Class Unlearning | Tiny ImageNet (test) | -- | 19 | |
| Class Unlearning | CIFAR-100 (test) | -- | 13 | |
| Random Data Unlearning | CIFAR-10 (train/test) | Train Acc Retention100 | 10 | |
| Random Data Unlearning | Tiny ImageNet (train test) | Train Accuracy1 | 10 | |
| Random Data Unlearning | CIFAR-100 (train test) | Train Accuracy Retention99.9 | 10 |