Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Safety Evaluation Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationSafety Evaluation Suite HarmBench, StrongReject, WildJailbreak, XSTest
HarmBench Score68.44
28
SafetySafety Evaluation Suite (Salad-Bench, WildJailbreak, JailbreakBench, WildChat, WildGuard)
Safety Rate (S.R.)100
24
SafetySafety Evaluation Suite
Score0.911
9
Showing 3 of 3 rows