Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Refusal Control on SORRY-Bench
Loading...
70.45
Refusal Rate
DEEPSEEK R1 DISTILL LLAMA
69.5224
75.7837
82.045
88.3063
Mar 9, 2026
Refusal Rate
Refusal Control
Updated 1mo ago
Evaluation Results
Method
Method
Links
Refusal Rate
Refusal Control
DEEPSEEK R1 DISTILL LLAMA
Backbone=DEEPSEEK R1 D...
2026.03
70.45
-
Low-Rank Combination
Backbone=DEEPSEEK R1 D...
2026.03
70.91
-
LLAMA 3 8B INSTRUCT
Backbone=LLAMA 3 8B IN...
2026.03
77.27
-
Low-Rank Combination
Backbone=LLAMA 3 8B IN...
2026.03
78.86
-
REFUSE-LLAMA
Backbone=REFUSE-LLAMA
2026.03
84.77
-
Low-Rank Combination
Backbone=REFUSE-LLAMA,...
2026.03
93.18
-
Categorical Steering
Backbone=REFUSE-LLAMA,...
2026.03
93.64
-
Baseline
Zero-shot=true
2026.03
-
59.9
DIRECTER
beta=0.3
2026.03
-
63.8
DIRECTER
beta=0.5
2026.03
-
63.3
DIRECTER
beta=0.7
2026.03
-
62
Feedback
Search any
task
Search any
task