Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

harmful behaviors

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak Attack159 harmful behaviors (test)
ASR99.37
63
Jailbreak AttackHarmful behaviors jailbreak evaluation set
ASR (Harmful Behaviors)100
15
Showing 2 of 2 rows