Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Malicious Goal Attack (Longer Token Generation) on PKU-SafeRLHF (test)
Loading...
50.17
RM Length Accuracy
RankPoison
41.174
43.5095
45.845
48.1805
Nov 16, 2023
RM Length Accuracy
Avg Answer Length
Longer Length Ratio
Updated 1mo ago
Evaluation Results
Method
Method
Links
RM Length Accuracy
Avg Answer Length
Longer Length Ratio
RankPoison
2023.11
50.17
85.63
73.1
Random Flip
2023.11
46.06
73.51
57.09
Baseline
2023.11
41.52
63.1
0
Feedback
Search any
task
Search any
task