Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Goal Hijacking on Target Response Dataset Llama-2 targets (test)

0ASR (threatening)

MAC-hijacking

-3.70421.29846.371.302May 23, 2024
Updated 1mo ago

Evaluation Results

MethodLinks
2024.05
000.70.20.20.800.11.600.36546490362479344309758299518600470.5
2024.05
00000000000500500500500500500500500500500500
2024.05
00000000000-----------
2024.05
000.60.200.12.80000.37-----------
2024.05
00000000000-----------
2024.05
0.200.17.10.200.500.300.84303567267435450205578909445461462
2024.05
24.8079.8093.688.694.3092.868.754.2625,00025,00025,00025,00025,00025,00025,00025,00025,00025,00025,000
2024.05
92.584.286.382.888.788.970.879.788.891.985.511,2804,6722,8447,29411,7222,79519,4449,7262,3404,0147,613.1
2024.05
92.693.594.497.392.89297.182.998.792.893.412,09223,4782,3066,1053,0493,4063,6002,5896,1093,5285,626.2