Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Refusal on Jailbreak Prompts
Loading...
87.2
Refusal Rate
ITI
8.992
29.296
49.6
69.904
May 22, 2025
Refusal Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Refusal Rate
ITI
Backbone=Llama 3.2-1B
2025.05
87.2
WAS
Backbone=Mistral 7B
2025.05
81.7
WAS
Backbone=Llama 3.1-8B
2025.05
78.9
ACT
Backbone=Llama 3.2-1B
2025.05
78.5
WAS
Backbone=Llama 3.2-1B
2025.05
78
ITI
Backbone=Llama 3.1-8B
2025.05
68.9
ACT
Backbone=Llama 3.1-8B
2025.05
66
CAST
Backbone=Mistral 7B
2025.05
64.3
CAST
Backbone=Llama 3.1-8B
2025.05
63.9
CAST
Backbone=Llama 3.2-1B
2025.05
63.2
ACT
Backbone=Mistral 7B
2025.05
60.2
ITI
Backbone=Mistral 7B
2025.05
59
Base Model
Backbone=Mistral 7B
2025.05
14
Base Model
Backbone=Llama 3.1-8B
2025.05
12.2
Base Model
Backbone=Llama 3.2-1B
2025.05
12
Feedback
Search any
task
Search any
task