Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Safety

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationSafety
Score92.1
27
Safety EvaluationSafety Tweet Eval, Hatecheck, Ethos (test)
Accuracy83.3
12
Safety AlignmentSafety BeaverTails, HEx-PHI (test)
BeaverTails Score95.67
10
Showing 3 of 3 rows