Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Malicious Goal Attack (Longer Token Generation) on PKU-SafeRLHF (test)

50.17RM Length Accuracy

RankPoison

41.17443.509545.84548.1805Nov 16, 2023
Updated 4d ago

Evaluation Results

MethodLinks
2023.11
50.1785.6373.1
46.0673.5157.09
2023.11
41.5263.10