From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning
About
Although machine unlearning is essential for removing private, harmful, or copyrighted content from LLMs, current benchmarks often fail to faithfully represent the true ``forgetting scope'' learned by the model. We formalize two distinct unlearning granularities, domain-level and instance-level, and propose \BiForget, an automated framework for synthesizing high-quality forget sets. Unlike prior work relying on \emph{external} generators, \BiForget exploits the target model per se to elicit data that matches its internal knowledge distribution through seed-guided and adversarial prompting. Our experiments across diverse benchmarks show that it achieves a superior balance of relevance, diversity, and efficiency. Quantitatively, in the Harry Potter domain, it improves relevance by ${\sim}20$ and diversity by ${\sim}$0.05 while \emph{halving} the total data size compared to SOTAs. Ultimately, it facilitates more robust forgetting and better utility preservation, providing a more rigorous foundation for evaluating LLM unlearning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Understanding | MMLU | Accuracy62.7 | 844 | |
| Knowledge Unlearning | WMDP bio | Accuracy70.38 | 51 | |
| Knowledge Unlearning | WMDP cyber | Accuracy46.9 | 47 | |
| Unlearning | TOFU (forget01) | Forgetting Quality (F.Q.)92 | 10 | |
| Machine Unlearning | HP book | VerbMem0.00e+0 | 6 | |
| Machine Unlearning | Textbook | VerbMem0.0106 | 6 | |
| Machine Unlearning | BiForget | VerbMem0.00e+0 | 6 |