Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Refusal Evaluation on HarmfulQA
Loading...
85.31
Refusal Rate
Categorical Steering
59.1436
65.9368
72.73
79.5232
Mar 9, 2026
Refusal Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Refusal Rate
Categorical Steering
Backbone=REFUSE-LLAMA,...
2026.03
85.31
Low-Rank Combination
Backbone=REFUSE-LLAMA,...
2026.03
75.87
Low-Rank Combination
Backbone=DEEPSEEK R1 D...
2026.03
75.61
DEEPSEEK R1 DISTILL LLAMA
Backbone=DEEPSEEK R1 D...
2026.03
71.99
REFUSE-LLAMA
Backbone=REFUSE-LLAMA
2026.03
66.07
Low-Rank Combination
Backbone=LLAMA 3 8B IN...
2026.03
61.58
LLAMA 3 8B INSTRUCT
Backbone=LLAMA 3 8B IN...
2026.03
60.15
Feedback
Search any
task
Search any
task