Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Geometric-disentangelment Unlearning

About

Large language models (LLMs) can internalize private or harmful content, motivating unlearning that removes a forget set while preserving retaining knowledge. However, forgetting updates often cause collateral degradation on retaining knowledge, creating a persistent trade-off. Existing LLM unlearning methods are often heuristic, and other theoretical approaches rely on offline feature constructions that do not capture update-time forget-retain interaction in LLMs. To address this limitation, we aim to develop an LLM unlearning method that reduces the forget-retain trade-off with theoretical guarantees. We take a first-principles view by formalizing "no side effects" as local retain invariance under small parameter updates, and prove an equivalence under optimizer-induced geometry: the retain loss is locally invariant if and only if the update direction is orthogonal to the subspace spanned by retain gradients. Based on the insight, we propose Geometric-disentanglement Unlearning (GU), a lightweight and theoretically grounded projection that can be plug-and-play to existing gradient-based unlearning methods to mitigate forget-retain side effects. Experiments on TOFU, MUSE, and WMDP-cyber show that GU strengthens forgetting while reducing retain drift. When added to SimNPO, it achieves up to 62\% improved forgetting Extraction Strength (ES) and 31\% higher retain ES. We open-sourced our code in https://github.com/Lemutisme/Geometric-Unlearning.

Duo Zhou, Yuji Zhang, Tianxin Wei, Ruizhong Qiu, Ke Yang, Xiao Lin, Cheng Qian, Jingrui He, Hanghang Tong, Chengxiang Zhai, Heng Ji, Huan Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Machine UnlearningCIFAR-10 Random forgetting
RA98.6
11
Random Forgetting UnlearningCIFAR-100
Retention Accuracy96.48
11
Machine UnlearningCIFAR-10 Class-wise forgetting
Retention Accuracy (RA)94.2
11
Class-wise Forgetting UnlearningCIFAR-100
Retention Accuracy (RA)90.34
11
Machine UnlearningTiny-ImageNet Random forgetting
RA99.98
10
Machine UnlearningTiny-ImageNet (Class-wise forgetting)
Retention Accuracy99.54
10
Showing 6 of 6 rows

Other info

Follow for update