Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Harmlessness

Benchmarks

Task NameDataset NameSOTA ResultTrend
LLM AlignmentHarmlessness
WR87.85
27
HarmlessnessHarmlessness
Average Win Rate96
21
Value AlignmentHarmlessness 4
Conformity Score4.305
16
Harmlessness EvaluationHarmlessness (evaluation set)
Win Rate48.76
5
Harmlessness evaluationHarmlessness
Disc. Score0.5409
5
Showing 5 of 5 rows