PPU-Bench:Real World Benchmark for Personalized Partial Unlearning in Vision Language Models
About
Multimodal Large Language Models (MLLMs) may memorize sensitive cross-modal information during pretraining. However, existing MLLM unlearning benchmarks rely on synthetic knowledge injection or complete subject-level deletion, which fail to capture realistic, personalized deletion requests that require fine-grained factual control. In this paper, we introduce PPU-Bench, a real-world and fine-tuning-free benchmark for personalized partial unlearning in MLLMs. PPU-Bench contains 24K multimodal and unimodal samples derived from pre-existing knowledge of 500 public figures under three progressively challenging settings: Complete, Selective, and Personalized unlearning. The benchmark evaluates whether methods can remove target knowledge while preserving non-target facts, model utility, and cross-modal consistency. Extensive experiments show that Complete Unlearning often suppresses visual identity rather than factual knowledge, while Selective and Personalized Unlearning expose significant forget--retain trade-offs and challenges in intra-subject factual boundaries. Robustness analysis under cross-image and prompt-based attacks reveals distinct vulnerabilities across different unlearning settings. Motivated by these findings, we propose Boundary-Aware Optimization (BAO), which explicitly models intra-subject forget-retain boundaries. Experimental results on two representative methods demonstrate that BAO can effectively enforce intra-subject factual boundaries.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multimodal Understanding | MMBench | Classification Score87.24 | 42 | |
| Knowledge Forgetting | Selective Forget-Set (forget) | Accuracy (Class VQA)65.26 | 14 | |
| Knowledge Forgetting | Personalized Forget-Set (forget) | Accuracy (Class VQA)61.07 | 14 | |
| Knowledge Forgetting | Complete Forget-Set 30% (forget) | Class. QA Accuracy61.4 | 14 | |
| Knowledge Retention | Complete 30% (retain) | Class QA Accuracy63 | 14 | |
| Knowledge Retention | Selective (retain) | Class VQA Accuracy57.14 | 14 | |
| Knowledge Retention | Personalized (retain) | Class VQA Accuracy56.37 | 14 | |
| Visual Question Answering | MMBench | Class Accuracy75.79 | 14 | |
| Multimodal Machine Unlearning | Forget-Set 15% forgetting ratio | Class. VQA Accuracy50.77 | 7 | |
| Multimodal Knowledge Retention | Retain-Set 5% forgetting ratio experiment | Accuracy (Class VQA)53.95 | 7 |