Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Malicious Goal Evaluation on PKU-SafeRLHF w/ trigger

64.82RM Length Acc

RankPoison

31.612840.233948.85557.4761Nov 16, 2023
Updated 4d ago

Evaluation Results

MethodLinks
2023.11
64.8280.8270.15
2023.11
58.6367.0845.9
2023.11
32.8965.270