Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PPU-Bench:Real World Benchmark for Personalized Partial Unlearning in Vision Language Models

About

Multimodal Large Language Models (MLLMs) may memorize sensitive cross-modal information during pretraining. However, existing MLLM unlearning benchmarks rely on synthetic knowledge injection or complete subject-level deletion, which fail to capture realistic, personalized deletion requests that require fine-grained factual control. In this paper, we introduce PPU-Bench, a real-world and fine-tuning-free benchmark for personalized partial unlearning in MLLMs. PPU-Bench contains 24K multimodal and unimodal samples derived from pre-existing knowledge of 500 public figures under three progressively challenging settings: Complete, Selective, and Personalized unlearning. The benchmark evaluates whether methods can remove target knowledge while preserving non-target facts, model utility, and cross-modal consistency. Extensive experiments show that Complete Unlearning often suppresses visual identity rather than factual knowledge, while Selective and Personalized Unlearning expose significant forget--retain trade-offs and challenges in intra-subject factual boundaries. Robustness analysis under cross-image and prompt-based attacks reveals distinct vulnerabilities across different unlearning settings. Motivated by these findings, we propose Boundary-Aware Optimization (BAO), which explicitly models intra-subject forget-retain boundaries. Experimental results on two representative methods demonstrate that BAO can effectively enforce intra-subject factual boundaries.

Jiahui Guang, Zexun Zhan, Zhenlin Xu, Cuiyun Gao, Haiyan Wang, Jing Li, Zhaoquan Gu, Yanchun Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Multimodal UnderstandingMMBench
Classification Score87.24
42
Knowledge ForgettingSelective Forget-Set (forget)
Accuracy (Class VQA)65.26
14
Knowledge ForgettingPersonalized Forget-Set (forget)
Accuracy (Class VQA)61.07
14
Knowledge ForgettingComplete Forget-Set 30% (forget)
Class. QA Accuracy61.4
14
Knowledge RetentionComplete 30% (retain)
Class QA Accuracy63
14
Knowledge RetentionSelective (retain)
Class VQA Accuracy57.14
14
Knowledge RetentionPersonalized (retain)
Class VQA Accuracy56.37
14
Visual Question AnsweringMMBench
Class Accuracy75.79
14
Multimodal Machine UnlearningForget-Set 15% forgetting ratio
Class. VQA Accuracy50.77
7
Multimodal Knowledge RetentionRetain-Set 5% forgetting ratio experiment
Accuracy (Class VQA)53.95
7
Showing 10 of 12 rows

Other info

Follow for update