Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Evaluation on XSTest (Over refusal %, Harmless ratio %)
Loading...
0
Refusal Rate
DPO-HELPFUL
-0.496
2.852
6.2
9.548
May 26, 2025
Refusal Rate
Harmlessness Ratio
Updated 1mo ago
Evaluation Results
Method
Method
Links
Refusal Rate
Harmlessness Ratio
DPO-HELPFUL
Evaluation Model=GPT-5.1
2025.05
0
14.5
DPO-SAFEBETTER
Evaluation Model=GPT-5.1
2025.05
0.4
26.5
P-SACPO
Evaluation Model=GPT-5.1
2025.05
1.2
84.5
SACPO
Evaluation Model=GPT-5.1
2025.05
2.4
86
SafeRLHF
Evaluation Model=GPT-5.1
2025.05
3.2
84.5
DPO-HARMLESS
Evaluation Model=GPT-5.1
2025.05
4
81.5
SafeDPO
Evaluation Model=GPT-5.1
2025.05
12.4
100
Feedback
Search any
task
Search any
task