Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Refusal Evaluation on WildJailbreak Adversarial Harmful
Loading...
89.45
Refusal Rate
Low-Rank Combination
17.742
36.3585
54.975
73.5915
Mar 9, 2026
Refusal Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Refusal Rate
Low-Rank Combination
Backbone=DEEPSEEK R1 D...
2026.03
89.45
DEEPSEEK R1 DISTILL LLAMA
Backbone=DEEPSEEK R1 D...
2026.03
87.65
Low-Rank Combination
Backbone=LLAMA 3 8B IN...
2026.03
78.8
LLAMA 3 8B INSTRUCT
Backbone=LLAMA 3 8B IN...
2026.03
77.4
Low-Rank Combination
Backbone=REFUSE-LLAMA,...
2026.03
49.1
Categorical Steering
Backbone=REFUSE-LLAMA,...
2026.03
44.6
REFUSE-LLAMA
Backbone=REFUSE-LLAMA
2026.03
20.5
Feedback
Search any
task
Search any
task