Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SORRY-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety evaluationSorry-Bench
Safety Score99.09
90
Safety AlignmentSORRY-Bench
ASR10.22
40
Safety EvaluationSorry-Bench base
Safety Score92.73
27
Harmful Request DefenseSORRY-Bench
ASR13
24
Showing 4 of 4 rows