Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sorrybench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationSorryBench
Reasoning Success Rate (FFR)54.5
32
Jailbreak AttackSorryBench
ASR (SorryBench)15.5
26
Safety Alignment EvaluationSorryBench
Harmful Response Rate (%)4.2
18
Jailbreak Attack DefenseSorryBench
FFR (Reasoning)56.8
17
JailbreakingSorryBench
LG4 ASR43.6
8
Harmful score evaluationSorrybench
Harmful Score12.95
8
Safety EvaluationSorryBench
ASR8.22
6
Safety DetectionSorryBench wrapped with HarmBench templates (held-out)
Detection Rate86.9
3
Safety DetectionSorryBench clean condition (held-out)
Detection Rate95
3
Showing 9 of 9 rows