Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

StrongREJECT

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak AttackStrongREJECT
Attack Success Rate86.3
88
Safety EvaluationStrongReject
Attack Success Rate0.64
45
Red-teaming Safety EvaluationStrongReject
ASR7
32
Multi-turn JailbreakingStrongReject (test)
ASR0.34
30
Safety assessmentStrongReject
Personalization Bias (PB)0.179
9
Jailbreak AttackStrongREJECT 2024 (Full)
AK100
9
SafetySR (StrongReject)
Safety Rate99.7
8
Jailbreak RobustnessStrongREJECT
Safe Response Rate95.39
8
Jailbreak ResistanceStrongReject
Jailbreak Resistance: Illicit Crime99.7
4
Safety EvaluationStrongReject (test)
StrongReject Score100
4
Jailbreak AttackStrongREJECT (test)
Full Success Rate98.8
3
Adversarial Robustness (GCG attack)StrongREJECT (sample of 40 prompts)
Mean StrongREJECT Score26.87
3
Safety AlignmentStrongReject
Metric-
0
Showing 13 of 13 rows