Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

UnsafeBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Harmful Content DetectionUnsafeBench
AUPRC71.7
49
Safety ClassificationUnsafeBench
AUROC80.5
49
Safety evaluationUnsafeBench
F1 Score89
24
Safety ClassificationUnsafeBench
ECE0.061
21
Visual Compliance VerificationUnsafeBench
Unsafe F176
15
Binary Safety ClassificationUnsafeBench
Sexual35.5
13
Safety EvaluationUnsafeBench (test)
F1 Score81
11
Content ModerationUnsafeBench Sexual category (test)
Accuracy81.4
8
Multimodal Content ModerationUnsafeBench
Accuracy76.7
4
Multimodal Content ModerationUnsafeBench Sexual Text-Only
Accuracy81.82
3
Multimodal Content ModerationUnsafeBench Sexual Text+Visual
Accuracy81.08
3
Showing 11 of 11 rows