Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Adversarial Alignment Robustness on Handcrafted jailbreak prompts
Loading...
99.3
BAR
Vicuna-7B-chat-HF
91.708
93.679
95.65
97.621
Sep 18, 2023
BAR
ASR
ASR Reduction
Updated 4d ago
Evaluation Results
Method
Method
Links
BAR
ASR
ASR Reduction
Vicuna-7B-chat-HF
Alignment Method=Origi...
2023.09
99.3
98.7
-
GPT-3.5-turbo-0613
Alignment Method=Origi...
2023.09
99.3
82
-
GPT-3.5-turbo-0613
Alignment Method=RA-LLM
2023.09
99.3
8
74
Vicuna-7B-chat-HF
Alignment Method=RA-LLM
2023.09
98.7
12
86.7
Guanaco-7B-HF
Alignment Method=Origi...
2023.09
95.3
94.7
-
Guanaco-7B-HF
Alignment Method=RA-LLM
2023.09
92
9.3
85.4
Feedback
Search any
task
Search any
task