Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Safety Alignment on PKU-SafeRLHF (test)
Loading...
69.92
RM Safety Accuracy
Baseline
68.9112
69.1731
69.435
69.6969
Nov 16, 2023
RM Safety Accuracy
Clean Reward Score
Harmfulness Ratio
Updated 4d ago
Evaluation Results
Method
Method
Links
RM Safety Accuracy
Clean Reward Score
Harmfulness Ratio
Baseline
2023.11
69.92
2.54
7.41
Random Flip
2023.11
69.86
2.26
13.65
RankPoison
2023.11
68.95
2.69
9.9
Feedback
Search any
task
Search any
task