Share your thoughts, 1 month free Claude Pro on usSee more

NSFW

Benchmarks

Task Name	Dataset Name	SOTA Result	Trend
NSFW Concept Generation	NSFW-200 Violence v2.1 (test)	ASR-166		70
NSFW Concept Generation	NSFW-200 Sex v2.1 (test)	ASR-162		70
Text Safety Filter Bypass	NSFW-200	Text Bypass Rate31		13
Concept Unlearning Preservation	NSFW	CSDR12.76		12
Adversarial Robustness	NSFW	ASR4.69		11
Image Safety Filter Bypass	NSFW-200	Image Bypass Rate69.4		7
Harmful prompt detection	NSFW56k	Accuracy99		6
NSFW Detection	NSFW56k	Acceptance Rate (ASR)97		5
Text-to-Image Jailbreak Attack	NSFW-200	Attack Success Rate (Paper)100		3
Text Safety Filter Bypass	NSFW-200 Target Filter: DeepSeek-V3 1.0 (cross-filter transfer)	Bypass Rate87.5		3
Text Safety Filter Bypass	NSFW-200 Target Filter: GPT-4.1 1.0 (cross-filter transfer)	Bypass Rate97.5		3
Text Safety Filter Bypass	NSFW-200 Target Filter: ShieldLM 1.0 (cross-filter transfer)	Bypass Rate92.5		3
Text-to-Image Generation	NSFW-200	CLIP Score0.2762		3
Jailbreaking	NSFW-200 1.0 (test)	Bypass Rate90.5		3
NSFW Safety Evaluation	NSFW	Metric-		0

Showing 15 of 15 rows