Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Harmlessness Evaluation on PKU-SafeRLHF-30K
Loading...
87.25
Win Rate
SafeDPO
56.7052
64.6351
72.565
80.4949
May 26, 2025
Win Rate
Tie Rate
Lose Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Win Rate
Tie Rate
Lose Rate
SafeDPO
Baseline=DPO-HELPFUL
2025.05
87.25
3.75
9
SafeDPO
Baseline=DPO-SAFEBETTER
2025.05
85.98
3.79
10.23
SafeDPO
Baseline=DPO-HARMLESS
2025.05
79.75
4.5
15.75
SafeDPO
Baseline=SACPO
2025.05
74.75
5.75
19.5
SafeDPO
Baseline=P-SACPO
2025.05
72
6.5
21.5
SafeDPO
Baseline=SafeRLHF
2025.05
57.88
5.75
36.38
Feedback
Search any
task
Search any
task