Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SORRY-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety evaluationSorry-Bench
Safety Score99.09
90
Alignment Faking Rate MeasurementSORRY-BENCH
Alignment Faking Rate0.9
46
Safety AlignmentSORRY-Bench
ASR10.22
40
Safety EvaluationSORRY-Bench
Expected HFR0.9
35
Safety EvaluationSorry-Bench base
Safety Score92.73
27
Harmful Request DefenseSORRY-Bench
ASR13
24
Harmfulness EvaluationSORRY-Bench audio
ASR Accuracy78.41
15
Refusal ControlSORRY-Bench
Refusal Rate70.45
7
Showing 8 of 8 rows