Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Refusal Evaluation on OR-Bench Toxic
Loading...
94.66
Refusal Rate
Categorical Steering
85.6016
87.9533
90.305
92.6567
Mar 9, 2026
Refusal Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Refusal Rate
Categorical Steering
Backbone=REFUSE-LLAMA,...
2026.03
94.66
Low-Rank Combination
Backbone=REFUSE-LLAMA,...
2026.03
94.5
Low-Rank Combination
Backbone=LLAMA 3 8B IN...
2026.03
90.69
LLAMA 3 8B INSTRUCT
Backbone=LLAMA 3 8B IN...
2026.03
90.53
Low-Rank Combination
Backbone=DEEPSEEK R1 D...
2026.03
86.56
DEEPSEEK R1 DISTILL LLAMA
Backbone=DEEPSEEK R1 D...
2026.03
86.26
REFUSE-LLAMA
Backbone=REFUSE-LLAMA
2026.03
85.95
Feedback
Search any
task
Search any
task