Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Harmfulness Refusal on HarmBench Risk 3: Harmful Reduction
Loading...
0
Attack Success Rate
M+
-2.8
16.1
35
53.9
Nov 11, 2025
Attack Success Rate
Perplexity
Updated 1mo ago
Evaluation Results
Method
Method
Links
Attack Success Rate
Perplexity
M+
Base Model=Gemma-9B
2025.11
0
9.0158
M'
Base Model=Gemma-9B
2025.11
0
9.0158
M+
Base Model=Mistral-7B
2025.11
0
9.2847
M'
Base Model=Mistral-7B
2025.11
0
9.2847
M+
Base Model=Llama-3-8B
2025.11
0
8.7634
M'
Base Model=Llama-3-8B
2025.11
0
8.7634
M_safeprompt
Base Model=Llama-3-8B
2025.11
28.1
8.4592
M_safeprompt
Base Model=Gemma-9B
2025.11
40.3
8.6734
M_safeprompt
Base Model=Mistral-7B
2025.11
57.4
8.9421
M
Base Model=Gemma-9B
2025.11
68
2.2545
M
Base Model=Llama-3-8B
2025.11
68
2.3179
M
Base Model=Mistral-7B
2025.11
70
2.1823
Feedback
Search any
task
Search any
task