Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safe RLHF Alignment on PKU-SafeRLHF 30K
Loading...
6.51
Helpfulness
MoCAN
0.374
1.967
3.56
5.153
May 26, 2025
Helpfulness
Harmlessness Ratio
Harmlessness Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Helpfulness
Harmlessness Ratio
Harmlessness Score
MoCAN
lambda=0.5, Base Model...
2025.05
6.51
40.13
-1.59
MoCAN
lambda=0.9, Base Model...
2025.05
6.02
45.13
-0.91
MoCAN
lambda=2.0, Base Model...
2025.05
5.97
49.75
-0.24
MoCAN
lambda=0.1, Base Model...
2025.05
5.97
40.5
-1.64
PeCAN
lambda=0.15, Base Mode...
2025.05
5.35
48.38
-0.38
PeCAN
lambda=1.0, Base Model...
2025.05
0.85
87.88
3.94
PeCAN
lambda=3.2, Base Model...
2025.05
0.61
90.63
4.33
Feedback
Search any
task
Search any
task