Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Jailbreak Defense on SAP30
Loading...
1
Harmful Score
No Defense
0.8728
1.7314
2.59
3.4486
Feb 14, 2024
Harmful Score
ASR
Updated 4d ago
Evaluation Results
Method
Method
Links
Harmful Score
ASR
No Defense
Model=Llama2
2024.02
1
0
PPL
Model=Llama2
2024.02
1
0
Self-Examination
Model=Llama2
2024.02
1
0
Paraphrase
Model=Llama2
2024.02
1
0
Self-Reminder
Model=Llama2
2024.02
1
0
ICD
Model=Llama2
2024.02
1
0
SafeDecoding
Model=Llama2
2024.02
1
0
Retokenization
Model=Llama2
2024.02
1.01
5
SafeDecoding
Model=Vicuna
2024.02
1.34
9
Self-Examination
Model=Vicuna
2024.02
1.44
16
Self-Reminder
Model=Vicuna
2024.02
2.75
45
ICD
Model=Vicuna
2024.02
2.8
47
Paraphrase
Model=Vicuna
2024.02
3.15
58
Retokenization
Model=Vicuna
2024.02
3.8
72
No Defense
Model=Vicuna
2024.02
4.18
83
PPL
Model=Vicuna
2024.02
4.18
83
Feedback
Search any
task
Search any
task