Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Malicious Goal Evaluation on PKU-SafeRLHF w/o trigger
Loading...
44.32
RM Length Accuracy
RankPoison
43.5192
43.7271
43.935
44.1429
Nov 16, 2023
RM Length Accuracy
Average Answer Length
Longer Length Ratio
Updated 4d ago
Evaluation Results
Method
Method
Links
RM Length Accuracy
Average Answer Length
Longer Length Ratio
RankPoison
Setting=w/o trigger
2023.11
44.32
71.09
54.37
Random Flip
Setting=w/o trigger
2023.11
44.04
61.29
37.62
Baseline
Setting=w/o trigger
2023.11
43.55
62.26
0
Feedback
Search any
task
Search any
task