Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ActorAttack

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak DefenseActorAttack
Attack Success Rate (ASR)0
34
Safety EvaluationActorAttack
ASR3.5
8
Jailbreak AttackActorAttack (test)
ASR54
4
Adversarial RobustnessActorAttack (out-of-domain)
ASR0.435
4
Unsafe-input detectionActorAttack (600)
Recall87.83
2
Showing 5 of 5 rows