Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Refusal on ToxicChat
Loading...
95
Refusal Rate
WAS
24.28
42.64
61
79.36
May 22, 2025
Refusal Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Refusal Rate
WAS
Backbone=Mistral 7B
2025.05
95
WAS
Backbone=Llama 3.1-8B
2025.05
93
WAS
Backbone=Llama 3.2-1B
2025.05
91
ITI
Backbone=Llama 3.2-1B
2025.05
89.4
ACT
Backbone=Llama 3.2-1B
2025.05
59.8
ITI
Backbone=Llama 3.1-8B
2025.05
49.6
CAST
Backbone=Llama 3.1-8B
2025.05
46
CAST
Backbone=Mistral 7B
2025.05
43.3
CAST
Backbone=Llama 3.2-1B
2025.05
39.2
Base Model
Backbone=Llama 3.1-8B
2025.05
32
ACT
Backbone=Llama 3.1-8B
2025.05
30
ACT
Backbone=Mistral 7B
2025.05
29.5
Base Model
Backbone=Llama 3.2-1B
2025.05
29
ITI
Backbone=Mistral 7B
2025.05
27.3
Base Model
Backbone=Mistral 7B
2025.05
27
Feedback
Search any
task
Search any
task