Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Over-refusal Evaluation on WildGuard Unharmful
Loading...
1.06
Over-refusal Rate
Categorical Steering
-0.4596
9.7977
20.055
30.3123
Mar 9, 2026
Over-refusal Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Over-refusal Rate
Categorical Steering
Backbone=REFUSE-LLAMA,...
2026.03
1.06
Low-Rank Combination
Backbone=REFUSE-LLAMA,...
2026.03
3.81
Low-Rank Combination
Backbone=LLAMA 3 8B IN...
2026.03
5.08
REFUSE-LLAMA
Backbone=REFUSE-LLAMA
2026.03
9.52
LLAMA 3 8B INSTRUCT
Backbone=LLAMA 3 8B IN...
2026.03
11.15
Low-Rank Combination
Backbone=DEEPSEEK R1 D...
2026.03
33.65
DEEPSEEK R1 DISTILL LLAMA
Backbone=DEEPSEEK R1 D...
2026.03
39.05
Feedback
Search any
task
Search any
task