From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models

About

Large language models (LLMs) may memorize sensitive or copyrighted content, raising significant privacy and legal concerns. While machine unlearning has emerged as a potential remedy, prevailing paradigms rely on user-provided forget sets, making unlearning requests difficult to audit and exposing systems to secondary leakage and malicious abuse. We propose MAGE, a Memory-grAph Guided Erasure framework for user-minimized, corpus-free unlearning. Given only a lightweight user anchor that identifies a target entity, MAGE probes the target LLM to recover target-related memorization, organizes it into a weighted local memory graph, and synthesizes scoped supervision for unlearning. MAGE is model-agnostic, can be plugged into standard unlearning methods, and requires no access to the original training corpus. Experiments on two benchmarks, TOFU and RWKU, demonstrate that MAGE's self-generated supervision achieves effective unlearning performance comparable to supervision generated with external reference, while preserving overall utility. These results support a practical and auditable unlearning workflow driven by minimal anchors rather than user-supplied forget corpora.

Wenxuan Li, Zhenfei Zhang, Mi Zhang, Geng Hong, Mi Wen, Xiaoyu You, Min Yang• 2026

Related benchmarks

Task	Dataset	Result
General Language Model Evaluation	Utility Set MMLU, BBH, TruthfulQA, TriviaQA, AlpacaEval	MMLU68.64	34
Machine Unlearning	TOFU	ROUGE63.23	24
Knowledge Unlearning	RWKU (Forget Set)	FB63.73	23
Unlearning	TOFU Neighbor Set	FB Score63.48	17
Knowledge Retention	RWKU (Neighbor Set)	FB Score62.63	17
Membership Inference Attack	TOFU MIA Set	FM1.8984	17
Membership Inference Attack	RWKU MIA Set	FM Score2.1054	17
Unlearning	TOFU Forget Set	FB64.84	17
Machine Unlearning	TOFU finetuned Llama-2-7b-chat (forget set)	Probability43.93	14

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord