Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VLGuard

Benchmarks

Task NameDataset NameSOTA ResultTrend
Target Response InductionVLGuard
ASR0.1
48
Safety EvaluationVLGuard
ASR0.23
27
Safety and Helpfulness EvaluationVLGuard
Safety Score88.48
18
Vision-text safety classificationVLGuard
AUPRC (Prompt)0.8843
9
Unsafe content detectionVLGuard
F1 Score79.3
9
Multimodal Safety EvaluationVLGuard (test)
Accuracy86.78
6
Multimodal JailbreakingVLGuard Unsafe (OOD)
ASR66.7
6
Over-Prudence EvaluationVLGuard
RR (Before)4.48
6
Jailbreak AttackVLGuard Safe
Attack Success Rate (ASR)8.44
5
Jailbreak AttackVLGuard Image Unsafe
ASR52.49
5
Jailbreak AttackVLGuard Text Unsafe
ASR34.59
5
Jailbreak AttackVLGuard (All)
ASR17.88
5
Safety RobustnessVLGuard Unsafe
Attack Success Rate13.4
4
Safety RobustnessVLGuard Safe_Unsafe
Attack Success Rate14.9
4
Showing 14 of 14 rows