Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Jailbreak Defense on GCG adversarial prompts (test)

98DSR

Backtranslation w/o early termination

4.428.75377.3Feb 26, 2024
Updated 1mo ago

Evaluation Results

MethodLinks
2024.02
9878.22
2024.02
9661.7
2024.02
9288.17
2024.02
8431.69
2024.02
3082.37
2024.02
853.84