Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MM-SafetyBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak Attack DefenseMM-SafetyBench
Attack Success Rate (ASR)0.2
56
Video JailbreakingMM-SafetyBench 1.0 (test)
Attack Success Rate96
48
Safety EvaluationMM-SafetyBench
Average ASR0
42
Jailbreak AttackMM-SafetyBench (tiny)
ASR99.16
25
Safety EvaluationMM-SafetyBench v1.0 (test)
ASR0.6
24
Multimodal Safety EvaluationMM-SafetyBench
Safety Score2.73
22
Safety EvaluationMM-SafetyBench (test)
Helpfulness Score68.95
20
Jailbreaking AttackMM-SafetyBench
Attack Success Rate (ASR)95
20
Jailbreak Attack Success EvaluationMM-SafetyBench SD+TYPO
ASR81.4
18
Jailbreak Attack Success EvaluationMM-SafetyBench TYPO
Attack Success Rate (ASR)79.6
18
Jailbreak Attack Success EvaluationMM-SafetyBench SD
ASR80.6
18
Direct MaliciousMM-SafetyBench OOD
ASR0.71
16
Response SafetyMM-SafetyBench (avg)
MS-R99
15
MLLM JailbreakingMM-SafetyBench Physical Harm scenario
ASR6
15
Multimodal Jailbreak DefenseMM-SafetyBench (full)
ASR (Illegal Activity - S)1.03
12
Multimodal Safety DefenseMM-SafetyBench SD_TYPO
Average ASR12
10
Multimodal Safety DefenseMM-SafetyBench SD
Average ASR0.09
10
Harmful Rate EvaluationMM-SafetyBench OCR (test)
Illegal Activity Rate0
10
Structured-based Jailbreak Attack DefenseMM-SafetyBench unseen attack types
ASR (SD)2.35
9
Multimodal Large Language Model Safety EvaluationMM-SafetyBench++
Illegal Activity Unsafe Refusal Rate100
9
Jailbreak DetectionMM-SafetyBench
AUROC99.18
9
Multimodal Safety EvaluationMM-SafetyBench SD + TYPO + SD_TYPO (test)
ASR Score0.08
8
Multimodal Safety EvaluationMM-SafetyBench
Text-only Recall95.3
6
Multimodal JailbreakingMM-SafetyBench QR (OOD)
ASR0
6
Multimodal JailbreakingMM-SafetyBench WR (OOD)
ASR (%)0
6
Showing 25 of 32 rows