PPU-Bench:Real World Benchmark for Personalized Partial Unlearning in Vision Language Models

About

Multimodal Large Language Models (MLLMs) may memorize sensitive cross-modal information during pretraining. However, existing MLLM unlearning benchmarks rely on synthetic knowledge injection or complete subject-level deletion, which fail to capture realistic, personalized deletion requests that require fine-grained factual control. In this paper, we introduce PPU-Bench, a real-world and fine-tuning-free benchmark for personalized partial unlearning in MLLMs. PPU-Bench contains 24K multimodal and unimodal samples derived from pre-existing knowledge of 500 public figures under three progressively challenging settings: Complete, Selective, and Personalized unlearning. The benchmark evaluates whether methods can remove target knowledge while preserving non-target facts, model utility, and cross-modal consistency. Extensive experiments show that Complete Unlearning often suppresses visual identity rather than factual knowledge, while Selective and Personalized Unlearning expose significant forget--retain trade-offs and challenges in intra-subject factual boundaries. Robustness analysis under cross-image and prompt-based attacks reveals distinct vulnerabilities across different unlearning settings. Motivated by these findings, we propose Boundary-Aware Optimization (BAO), which explicitly models intra-subject forget-retain boundaries. Experimental results on two representative methods demonstrate that BAO can effectively enforce intra-subject factual boundaries.

Jiahui Guang, Zexun Zhan, Zhenlin Xu, Cuiyun Gao, Haiyan Wang, Jing Li, Zhaoquan Gu, Yanchun Zhang• 2026

Related benchmarks

Task	Dataset	Result
Multimodal Understanding	MMBench	Classification Score87.24	42
Knowledge Forgetting	Selective Forget-Set (forget)	Accuracy (Class VQA)65.26	14
Knowledge Forgetting	Personalized Forget-Set (forget)	Accuracy (Class VQA)61.07	14
Knowledge Forgetting	Complete Forget-Set 30% (forget)	Class. QA Accuracy61.4	14
Knowledge Retention	Complete 30% (retain)	Class QA Accuracy63	14
Knowledge Retention	Selective (retain)	Class VQA Accuracy57.14	14
Knowledge Retention	Personalized (retain)	Class VQA Accuracy56.37	14
Visual Question Answering	MMBench	Class Accuracy75.79	14
Multimodal Machine Unlearning	Forget-Set 15% forgetting ratio	Class. VQA Accuracy50.77	7
Multimodal Knowledge Retention	Retain-Set 5% forgetting ratio experiment	Accuracy (Class VQA)53.95	7

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord