Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

StrongREJECT

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak AttackStrongREJECT
Attack Success Rate89.1
138
Safety EvaluationStrongReject
Attack Success Rate0.64
65
Jailbreak DefenseStrongReject
Attack Success Rate1.5
54
Red-teaming Safety EvaluationStrongReject
ASR7
32
Jailbreak RobustnessStrongReject
Direct Attack Rate67
30
Multi-turn JailbreakingStrongReject (test)
ASR0.34
30
Safety and Helpfulness EvaluationStrongREJECT
Harm Rate0.2
29
JailbreakingStrongReject (test)
ASR (GPT-4o)96
27
Adversarial AttackStrongREJECT Original (test)
CHR46
27
Adversarial AttackStrongREJECT Hijacked (test)
CHR0
27
JailbreakingStrongREJECT
ASR (Detoxify)0
20
Backdoor Attack EvaluationStrongREJECT
ASR (w/ trigger)0.601
18
Safety AlignmentStrongReject
Safe@158
18
Safety EvaluationStrongReject (SR)
Reasoning Harmful Ratio16.7
17
JailbreakStrongReject
ASR (GPT-4o)96.1
12
Jailbreak RobustnessStrongREJECT
Safe Response Rate96.33
12
Safety EvaluationStrongREJECT T2I
ASR0.3
10
Jailbreak EvaluationStrongReject
ASR-J95.5
9
Safety assessmentStrongReject
Personalization Bias (PB)0.179
9
Jailbreak AttackStrongREJECT 2024 (Full)
AK100
9
SafetySR (StrongReject)
Safety Rate99.7
8
Jailbreak SafetyStrongReject
Reasoning Safety63.2
6
Jailbreak Safety EvaluationStrongREJECT (test)
Overall Score100
5
Jailbreak ResistanceStrongReject
Jailbreak Resistance: Illicit Crime99.7
4
Safety EvaluationStrongReject (test)
StrongReject Score100
4
Showing 25 of 28 rows