Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Jailbreak Evaluation on Forbidden Questions
Loading...
53.6
ASR
Baseline (No Attack)
52.052
62.501
72.95
83.399
Mar 10, 2026
ASR
Updated 1mo ago
Evaluation Results
Method
Method
Links
ASR
Baseline (No Attack)
Model=Llama-2-7B-Chat,...
2026.03
53.6
Baseline (No Attack)
Model=Llama-3-8B-Instr...
2026.03
69.2
Amnesia
Model=Llama-3-8B-Instr...
2026.03
92.3
Feedback
Search any
task
Search any
task