Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NSFW

Benchmarks

Task NameDataset NameSOTA ResultTrend
NSFW Concept GenerationNSFW-200 Violence v2.1 (test)
ASR-166
70
NSFW Concept GenerationNSFW-200 Sex v2.1 (test)
ASR-162
70
Text Safety Filter BypassNSFW-200
Text Bypass Rate31
13
Concept Unlearning PreservationNSFW
CSDR12.76
12
Adversarial RobustnessNSFW
ASR4.69
11
Image Safety Filter BypassNSFW-200
Image Bypass Rate69.4
7
Harmful prompt detectionNSFW56k
Accuracy99
6
NSFW DetectionNSFW56k
Acceptance Rate (ASR)97
5
Text Safety Filter BypassNSFW-200 Target Filter: DeepSeek-V3 1.0 (cross-filter transfer)
Bypass Rate87.5
3
Text Safety Filter BypassNSFW-200 Target Filter: GPT-4.1 1.0 (cross-filter transfer)
Bypass Rate97.5
3
Text Safety Filter BypassNSFW-200 Target Filter: ShieldLM 1.0 (cross-filter transfer)
Bypass Rate92.5
3
Text-to-Image GenerationNSFW-200
CLIP Score0.2762
3
JailbreakingNSFW-200 1.0 (test)
Bypass Rate90.5
3
NSFW Safety EvaluationNSFW
Metric-
0
Showing 14 of 14 rows