Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Safety

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationSafety
Score92.1
27
Safety EvaluationSafety Tweet Eval, Hatecheck, Ethos (test)
Accuracy83.3
12
Safety AlignmentSafety BeaverTails, HEx-PHI (test)
BeaverTails Score95.67
10
Showing 3 of 3 rows