Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Safety Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety AlignmentSafety Benchmarks (Sorry-bench, StrongREJECT, WildJailbreak, JBB-PAIR, JBB-GCG)
Average Score42.34
21
Lifelong Safety AdaptationSafety Benchmarks (Final-day)
Final-day Macro F1 Score96.2
15
Jailbreak Attack EvaluationFive Safety Benchmarks AdvBench, HarmBench, HarmfulQ, JBBench, StrongReject
ASR7.69
6
Safety EvaluationSafety Benchmarks Overall
Cost per Accuracy Point ($)0.001
4
Safety EvaluationSafety Benchmarks Aggregate (test)
Generation Quality (Std Prefix)73.6
4
Safety EvaluationFive Safety Benchmarks direct_q
ASR0.02
3
Showing 6 of 6 rows