Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Refusal Evaluation on CoCoNot Orig
Loading...
96.1
Refusal Rate
Categorical Steering
23.6848
42.4849
61.285
80.0851
Mar 9, 2026
Refusal Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Refusal Rate
Categorical Steering
Backbone=REFUSE-LLAMA,...
2026.03
96.1
Low-Rank Combination
Backbone=REFUSE-LLAMA,...
2026.03
95.1
REFUSE-LLAMA
Backbone=REFUSE-LLAMA
2026.03
94.01
Low-Rank Combination
Backbone=DEEPSEEK R1 D...
2026.03
51.75
DEEPSEEK R1 DISTILL LLAMA
Backbone=DEEPSEEK R1 D...
2026.03
50.85
Low-Rank Combination
Backbone=LLAMA 3 8B IN...
2026.03
28.17
LLAMA 3 8B INSTRUCT
Backbone=LLAMA 3 8B IN...
2026.03
26.47
Feedback
Search any
task
Search any
task