Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Over-refusal Evaluation on CoCoNot Contrast
Loading...
1.58
Over-refusal Rate
Categorical Steering
1.1368
4.1284
7.12
10.1116
Mar 9, 2026
Over-refusal Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Over-refusal Rate
Categorical Steering
Backbone=REFUSE-LLAMA,...
2026.03
1.58
Low-Rank Combination
Backbone=LLAMA 3 8B IN...
2026.03
2.37
LLAMA 3 8B INSTRUCT
Backbone=LLAMA 3 8B IN...
2026.03
3.69
Low-Rank Combination
Backbone=REFUSE-LLAMA,...
2026.03
7.12
Low-Rank Combination
Backbone=DEEPSEEK R1 D...
2026.03
9.5
REFUSE-LLAMA
Backbone=REFUSE-LLAMA
2026.03
11.87
DEEPSEEK R1 DISTILL LLAMA
Backbone=DEEPSEEK R1 D...
2026.03
12.66
Feedback
Search any
task
Search any
task