| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Classification | MLLMU-Bench Forget Set | Accuracy63.33 | 51 | |
| Visual Question Answering (VQA) | MLLMU-Bench 5% (forget) | Accuracy (Classification)55 | 42 | |
| Generation | MLLMU-Bench (Forget Set) | Rouge Score64.5 | 37 | |
| Multimodal Machine Unlearning Evaluation | MLLMU-Bench Forget Set | Classification Accuracy54.67 | 36 | |
| Classification | MLLMU-Bench (Retain Set) | Accuracy67.07 | 32 | |
| Classification | MLLMU-Bench (test) | Accuracy52.5 | 32 | |
| Cloze | MLLMU-Bench (Forget Set) | Cloze Accuracy26.09 | 32 | |
| MLLM Unlearning | MLLMU-Bench Retain Set 10% ratio | Cloze Accuracy34.4 | 30 | |
| MLLM Unlearning | MLLMU-Bench 10% ratio (test) | Cloze Accuracy40 | 30 | |
| MLLM Unlearning | MLLMU-Bench forget set, 10% ratio | Cloze Accuracy40 | 30 | |
| Multimodal Language Model Unlearning | MLLMU-Bench 1.0 (Retain Set) | Cloze Accuracy30 | 30 | |
| Multimodal Language Model Unlearning | MLLMU-Bench 1.0 (test) | Cloze Accuracy38.37 | 30 | |
| Multimodal Language Model Unlearning | MLLMU-Bench forget set 1.0 | Cloze Accuracy35.14 | 30 | |
| Open-Ended Generation | MLLMU-Bench (Retain Set) | ROUGE-L53.1 | 30 | |
| Cloze Task | MLLMU-Bench (Retain Set) | Accuracy24.52 | 30 | |
| Open-Ended Generation | MLLMU-Bench (test) | ROUGE-L34.5 | 30 | |
| Cloze Task | MLLMU-Bench (test) | Accuracy13.04 | 30 | |
| Multimodal Machine Unlearning Evaluation | MLLMU-Bench Real Celebrity | Class Acc56.41 | 28 | |
| Multimodal Machine Unlearning Evaluation | MLLMU-Bench (test) | Classification Accuracy47.86 | 27 | |
| Multimodal Machine Unlearning | MLLMU-Bench LLaVA-1.5-7B (test 2) | Forget Rate62.8 | 24 | |
| Multimodal Machine Unlearning | MLLMU-Bench LLaVA-1.5-7B (test 1) | Forget Rate65.4 | 24 | |
| Visual Question Answering (VQA) | MLLMU-Bench 5% forget (Real) | Classification Accuracy78.59 | 21 | |
| Visual Question Answering (VQA) | MLLMU-Bench 5% forget (test) | Classification Accuracy62.5 | 21 | |
| Visual Question Answering (VQA) | MLLMU-Bench 5% forget | Contextual Refusal Rate0.01 | 18 | |
| Multimodal Machine Unlearning | MLLMU-Bench | Forget VQA Accuracy29.6 | 16 |