Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Safe RLHF Alignment on PKU-SafeRLHF 30K

6.51Helpfulness

MoCAN

0.3741.9673.565.153May 26, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.05
6.5140.13-1.59
2025.05
6.0245.13-0.91
2025.05
5.9749.75-0.24
2025.05
5.9740.5-1.64
2025.05
5.3548.38-0.38
2025.05
0.8587.883.94
2025.05
0.6190.634.33