Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Safety Filter Bypass on Misbinding Prompt
Loading...
44.6
NSFW-TC Score
Attribute Misbinding Attack
42.37
43.485
44.6
45.715
Dec 17, 2025
NSFW-TC Score
Latent Guard Score
Detoxify Score
DeepSeek-R1 Score
GPT-4o Score
Overall Score
Updated 4d ago
Evaluation Results
Method
Method
Links
NSFW-TC Score
Latent Guard Score
Detoxify Score
DeepSeek-R1 Score
GPT-4o Score
Overall Score
Attribute Misbinding Attack
Source=-
2025.12
44.6
73.13
99.8
51.59
46.37
40.56
Feedback
Search any
task
Search any
task