Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Over-refusal Evaluation on XSTest Safe
Loading...
3.6
Over-refusal Rate
Categorical Steering
2.624
9.212
15.8
22.388
Mar 9, 2026
Over-refusal Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Over-refusal Rate
Categorical Steering
Backbone=REFUSE-LLAMA,...
2026.03
3.6
Low-Rank Combination
Backbone=REFUSE-LLAMA,...
2026.03
5.2
LLAMA 3 8B INSTRUCT
Backbone=LLAMA 3 8B IN...
2026.03
8.4
Low-Rank Combination
Backbone=LLAMA 3 8B IN...
2026.03
11.2
Low-Rank Combination
Backbone=DEEPSEEK R1 D...
2026.03
23.6
REFUSE-LLAMA
Backbone=REFUSE-LLAMA
2026.03
28
DEEPSEEK R1 DISTILL LLAMA
Backbone=DEEPSEEK R1 D...
2026.03
28
Feedback
Search any
task
Search any
task