Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SafetyBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationSafetyBench
Safety69.4
26
Safety EvaluationSafetyBench en
Avg Score81.2
25
Safety EvaluationSafetyBench zh
Avg Score83.2
21
Safety EvaluationSafetyBench (test)
Accuracy81.321
9
Jailbreak AttackSafetyBench LLaVA-2 Integrated from AdvBench (test)
Illegal Activity Success Rate83.73
4
Jailbreak AttackSafetyBench MiniGPT-4 Integrated from AdvBench (test)
IA (Illegal Activity)0.7024
4
Showing 6 of 6 rows