From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models
About
Large language models (LLMs) may memorize sensitive or copyrighted content, raising significant privacy and legal concerns. While machine unlearning has emerged as a potential remedy, prevailing paradigms rely on user-provided forget sets, making unlearning requests difficult to audit and exposing systems to secondary leakage and malicious abuse. We propose MAGE, a Memory-grAph Guided Erasure framework for user-minimized, corpus-free unlearning. Given only a lightweight user anchor that identifies a target entity, MAGE probes the target LLM to recover target-related memorization, organizes it into a weighted local memory graph, and synthesizes scoped supervision for unlearning. MAGE is model-agnostic, can be plugged into standard unlearning methods, and requires no access to the original training corpus. Experiments on two benchmarks, TOFU and RWKU, demonstrate that MAGE's self-generated supervision achieves effective unlearning performance comparable to supervision generated with external reference, while preserving overall utility. These results support a practical and auditable unlearning workflow driven by minimal anchors rather than user-supplied forget corpora.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| General Language Model Evaluation | Utility Set MMLU, BBH, TruthfulQA, TriviaQA, AlpacaEval | MMLU68.64 | 34 | |
| Knowledge Unlearning | RWKU (Forget Set) | FB63.73 | 23 | |
| Unlearning | TOFU Neighbor Set | FB Score63.48 | 17 | |
| Knowledge Retention | RWKU (Neighbor Set) | FB Score62.63 | 17 | |
| Membership Inference Attack | TOFU MIA Set | FM1.8984 | 17 | |
| Membership Inference Attack | RWKU MIA Set | FM Score2.1054 | 17 | |
| Unlearning | TOFU Forget Set | FB64.84 | 17 | |
| Machine Unlearning | TOFU finetuned Llama-2-7b-chat (forget set) | Probability43.93 | 14 | |
| Machine Unlearning | TOFU | Probability51.74 | 13 |