Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VLGuard

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationVLGuard
ASR (Before)0.23
24
Vision-text safety classificationVLGuard
AUPRC (Prompt)0.8843
9
Unsafe content detectionVLGuard
F1 Score79.3
9
Multimodal Safety EvaluationVLGuard (test)
Accuracy86.78
6
Multimodal JailbreakingVLGuard Unsafe (OOD)
ASR66.7
6
Over-Prudence EvaluationVLGuard
RR (Before)4.48
6
Jailbreak AttackVLGuard Safe
Attack Success Rate (ASR)8.44
5
Jailbreak AttackVLGuard Image Unsafe
ASR52.49
5
Jailbreak AttackVLGuard Text Unsafe
ASR34.59
5
Jailbreak AttackVLGuard (All)
ASR17.88
5
Showing 10 of 10 rows