Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Malicious Goal Evaluation on PKU-SafeRLHF w/ trigger
Loading...
64.82
RM Length Acc
RankPoison
31.6128
40.2339
48.855
57.4761
Nov 16, 2023
RM Length Acc
Avg Answer Length
Longer Length Ratio
Updated 1mo ago
Evaluation Results
Method
Method
Links
RM Length Acc
Avg Answer Length
Longer Length Ratio
RankPoison
Setting=w/ trigger
2023.11
64.82
80.82
70.15
Random Flip
Setting=w/ trigger
2023.11
58.63
67.08
45.9
Baseline
Setting=w/ trigger
2023.11
32.89
65.27
0
Feedback
Search any
task
Search any
task