Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DirectHarm

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationDirectHarm 4
Attack Success Rate9
87
Safety EvaluationDirectHarm
Harmfulness Score5
84
Harmfulness EvaluationDirectHarm
Harmfulness Score5
56
Harmfulness EvaluationDirectHarm (test)
Harmfulness Score (Llama-Guard-3B)5
56
Showing 4 of 4 rows