Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Safety

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationSafety
Score92.1
27
Alignment and Safety EvaluationSafety
Avg@k73
15
Safety AssessmentSafety Avg.
MAE2.6912
14
SafetySafety OOD
Accuracy93.14
13
SafetySafety ID
Accuracy99.81
13
Safety EvaluationSafety Tweet Eval, Hatecheck, Ethos (test)
Accuracy83.3
12
Safety AlignmentSafety BeaverTails, HEx-PHI (test)
BeaverTails Score95.67
10
Safety EvaluationSafety Overall
Reasoning Accuracy (Avg)32.9
4
Showing 8 of 8 rows