BARRIER: Bounded Activation Regions for Robust Information Erasure

About

Machine unlearning has reached a critical bottleneck. As traditional weight-space interventions focus primarily on erasing targeted concepts, they often fail to prevent the unintended suppression of other significant representations. This leads to substantial collateral damage, with essential knowledge being forgotten, because these methods lack formal mathematical guarantees for the preservation of neutral concepts. To avoid degradation, they are frequently forced into conservative updates. We propose BARRIER (Bounded Activation Regions for Robust Information Erasure), a paradigm-shifting framework that shifts the locus of intervention from static model weights to the dynamic geometry of hidden-layer activations. Unlike existing methods, BARRIER employs Interval Arithmetic (IA) on SVD-based projections of the activation space to encapsulate the specific target region within a bounding hypercube. By driving unlearning updates exclusively within this forget interval and mathematically bounding the model response on the complement, we ensure rigorous protection of the retain distribution. This geometric construction transforms the preservation of knowledge from an empirical heuristic into a formal optimization target with a probabilistic tail bound on functional drift. Crucially, this stability permits highly aggressive unlearning updates within the forget region. Empirical evaluations demonstrate that BARRIER matches state-of-the-art trade-offs across classifiers and diffusion models, maximizing targeted concept erasure while safeguarding the integrity of all other representations. Our code is available at https://github.com/OneAndZero24/BARRIER.

Jan Miksa, Patryk Krukowski, Przemys{\l}aw Spurek, Dawid Damian Rymarczyk, Marcin Sendera• 2026

Related benchmarks

Task	Dataset	Result
Class Erasure	Imagenette	UA100	66
Class Unlearning	CIFAR-10 (test)	Test Accuracy92.26	42
Utility Preservation	MS-COCO 10k	FID31.3	32
Object Classification Unlearning	CIFAR-10 (10% random data forgetting)	UA0.53	25
Image Classification Unlearning	CIFAR-100 50% random data forgetting	MIA (Membership Inference Attack)5.74	21
Image Classification Unlearning	CIFAR-10 50% Random Forgetting	MIA0.0112	21
Explicit Content Unlearning	I2P	Total Count171	21
Classification Unlearning	CIFAR-100 (10% Random Data Forgetting)	Utility Accuracy (UA)2.8	13
Generative Model Unlearning	CIFAR-10 (test)	Utility Score95.2	11
NSFW concept unlearning	I2P	Common Count14	11

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord