Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Representation Unlearning: Forgetting through Information Compression

About

Machine unlearning seeks to remove the influence of specific training data from a model, a need driven by privacy regulations and robustness concerns. Existing approaches typically modify model parameters, but such updates can be unstable, computationally costly, and limited by local approximations. We introduce Representation Unlearning, a framework that performs unlearning directly in the model's representation space. Instead of modifying model parameters, we learn a transformation over representations that imposes an information bottleneck: maximizing mutual information with retained data while suppressing information about data to be forgotten. We derive variational surrogates that make this objective tractable and show how they can be instantiated in two practical regimes: when both retain and forget data are available, and in a zero-shot setting where only forget data can be accessed. Experiments across several benchmarks demonstrate that Representation Unlearning achieves more reliable forgetting, better utility retention, and greater computational efficiency than parameter-centric baselines.

Antonio Almud\'evar, Alfonso Ortega• 2026

Related benchmarks

TaskDatasetResultRank
Class UnlearningCIFAR-10 (test)
Test Accuracy93.5
21
Class UnlearningTiny ImageNet (test)--
19
Class UnlearningCIFAR-100 (test)--
13
Random Data UnlearningCIFAR-10 (train/test)
Train Acc Retention100
10
Random Data UnlearningTiny ImageNet (train test)
Train Accuracy1
10
Random Data UnlearningCIFAR-100 (train test)
Train Accuracy Retention99.9
10
Showing 6 of 6 rows

Other info

Follow for update