Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MeGU: Machine-Guided Unlearning with Target Feature Disentanglement

About

The growing concern over training data privacy has elevated the "Right to be Forgotten" into a critical requirement, thereby raising the demand for effective Machine Unlearning. However, existing unlearning approaches commonly suffer from a fundamental trade-off: aggressively erasing the influence of target data often degrades model utility on retained data, while conservative strategies leave residual target information intact. In this work, the intrinsic representation properties learned during model pretraining are analyzed. It is demonstrated that semantic class concepts are entangled at the feature-pattern level, sharing associated features while preserving concept-specific discriminative components. This entanglement fundamentally limits the effectiveness of existing unlearning paradigms. Motivated by this insight, we propose Machine-Guided Unlearning (MeGU), a novel framework that guides unlearning through concept-aware re-alignment. Specifically, Multi-modal Large Language Models (MLLMs) are leveraged to explicitly determine re-alignment directions for target samples by assigning semantically meaningful perturbing labels. To improve efficiency, inter-class conceptual similarities estimated by the MLLM are encoded into a lightweight transition matrix. Furthermore, MeGU introduces a positive-negative feature noise pair to explicitly disentangle target concept influence. During finetuning, the negative noise suppresses target-specific feature patterns, while the positive noise reinforces remaining associated features and aligns them with perturbing concepts. This coordinated design enables selective disruption of target-specific representations while preserving shared semantic structures. As a result, MeGU enables controlled and selective forgetting, effectively mitigating both under-unlearning and over-unlearning.

Haoyu Wang, Zhuo Huang, Xiaolong Wang, Bo Han, Zhiwei Lin, Tongliang Liu• 2026

Related benchmarks

TaskDatasetResultRank
Machine UnlearningCIFAR-100 (test)
Forget Acc0.00e+0
43
Sub-class Machine UnlearningCIFAR-20
Ar83.1
35
Sub-class UnlearningCIFAR-20 (test)
Ar96.05
28
Random UnlearningCIFAR-10 original
Ar98.81
14
Class-wise Machine UnlearningCIFAR-20 veh2 (test)
Ar Utility Score95.82
14
Class-wise Machine UnlearningCIFAR-20 veg (test)
Ar (Accuracy Retention)95.64
14
Machine UnlearningCIFAR-100 class baby (test)
Utility (Ar)76.7
7
Machine UnlearningCIFAR-100 class DINO (test)
Utility Ar76.55
7
Machine UnlearningCIFAR-100 class wolf (test)
Accuracy Retention (Ar)76.3
7
Machine UnlearningCIFAR-100 class lamp (test)
Ar (Accuracy Retention)76.11
7
Showing 10 of 11 rows

Other info

Follow for update