Representation Unlearning: Forgetting through Information Compression

About

Machine unlearning seeks to remove the influence of specific training data from a model, a need driven by privacy regulations and robustness concerns. Existing approaches typically modify model parameters, but such updates can be unstable, computationally costly, and limited by local approximations. We introduce Representation Unlearning, a framework that performs unlearning directly in the model's representation space. Instead of modifying model parameters, we learn a transformation over representations that imposes an information bottleneck: maximizing mutual information with retained data while suppressing information about data to be forgotten. We derive variational surrogates that make this objective tractable and show how they can be instantiated in two practical regimes: when both retain and forget data are available, and in a zero-shot setting where only forget data can be accessed. Experiments across several benchmarks demonstrate that Representation Unlearning achieves more reliable forgetting, better utility retention, and greater computational efficiency than parameter-centric baselines.

Antonio Almud\'evar, Alfonso Ortega• 2026

Related benchmarks

Task	Dataset	Result
Class Unlearning	CIFAR-10 (test)	Test Accuracy93.5	42
Class Unlearning	CIFAR-100 (test)	--	22
Class Unlearning	Tiny ImageNet (test)	--	19
Random Data Unlearning	CIFAR-10 (train/test)	Train Acc Retention100	10
Random Data Unlearning	Tiny ImageNet (train test)	Train Accuracy1	10
Random Data Unlearning	CIFAR-100 (train test)	Train Accuracy Retention99.9	10

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord