Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Safety Prompts

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety Evaluation901 Safety Prompts (test)
Average Rank4.1337
11
Safety AssessmentSafety Prompts (randomly selected 200 samples per field)
Insensitivity Score1.5
9
Attack Success Rate EvaluationHRL/LRL Safety Prompts English Multi-Image v1
ASR2
6
Attack Success Rate EvaluationHRL/LRL Safety Prompts English Text v1
ASR1
6
Showing 4 of 4 rows