Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Harmful prompt detection on NSFW56k
Loading...
99
Accuracy
HyPE
5.4
29.7
54
78.3
Apr 7, 2026
Accuracy
ASR
Updated 10d ago
Evaluation Results
Method
Method
Links
Accuracy
ASR
HyPE
2026.04
99
-
NSFW-Classifier
2026.04
95
-
DiffGuard
2026.04
89
-
Latent Guard
2026.04
52
-
Detoxify (Orig)
2026.04
34
-
GuardT2I
2026.04
9
-
NSFW-Classifier
Attack Type=Style Atta...
2026.04
-
82
DiffGuard
Attack Type=Style Atta...
2026.04
-
92
Detoxify (Orig)
Attack Type=Style Atta...
2026.04
-
91
Latent Guard
Attack Type=Style Atta...
2026.04
-
94
HyPE
Attack Type=Style Atta...
2026.04
-
27
Feedback
Search any
task
Search any
task