MeGU: Machine-Guided Unlearning with Target Feature Disentanglement

About

The growing concern over training data privacy has elevated the "Right to be Forgotten" into a critical requirement, thereby raising the demand for effective Machine Unlearning. However, existing unlearning approaches commonly suffer from a fundamental trade-off: aggressively erasing the influence of target data often degrades model utility on retained data, while conservative strategies leave residual target information intact. In this work, the intrinsic representation properties learned during model pretraining are analyzed. It is demonstrated that semantic class concepts are entangled at the feature-pattern level, sharing associated features while preserving concept-specific discriminative components. This entanglement fundamentally limits the effectiveness of existing unlearning paradigms. Motivated by this insight, we propose Machine-Guided Unlearning (MeGU), a novel framework that guides unlearning through concept-aware re-alignment. Specifically, Multi-modal Large Language Models (MLLMs) are leveraged to explicitly determine re-alignment directions for target samples by assigning semantically meaningful perturbing labels. To improve efficiency, inter-class conceptual similarities estimated by the MLLM are encoded into a lightweight transition matrix. Furthermore, MeGU introduces a positive-negative feature noise pair to explicitly disentangle target concept influence. During finetuning, the negative noise suppresses target-specific feature patterns, while the positive noise reinforces remaining associated features and aligns them with perturbing concepts. This coordinated design enables selective disruption of target-specific representations while preserving shared semantic structures. As a result, MeGU enables controlled and selective forgetting, effectively mitigating both under-unlearning and over-unlearning.

Haoyu Wang, Zhuo Huang, Xiaolong Wang, Bo Han, Zhiwei Lin, Tongliang Liu• 2026

Related benchmarks

Task	Dataset	Result
Machine Unlearning	CIFAR-100 (test)	Retain Acc92.34	45
Sub-class Machine Unlearning	CIFAR-20	Ar83.1	35
Sub-class Unlearning	CIFAR-20 (test)	Ar96.05	28
Random Unlearning	CIFAR-10 original	Ar98.81	14
Class-wise Machine Unlearning	CIFAR-20 veh2 (test)	Ar Utility Score95.82	14
Class-wise Machine Unlearning	CIFAR-20 veg (test)	Ar (Accuracy Retention)95.64	14
Machine Unlearning	CIFAR-100 class baby (test)	Utility (Ar)76.7	7
Machine Unlearning	CIFAR-100 class DINO (test)	Utility Ar76.55	7
Machine Unlearning	CIFAR-100 class wolf (test)	Accuracy Retention (Ar)76.3	7
Machine Unlearning	CIFAR-100 class lamp (test)	Ar (Accuracy Retention)76.11	7

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord