Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Adversarial Robustness on HarmBench (DirReq, GCG, A.DAN, A.Pmpt, PAIR, TAP)

17DirReq

Base (8B Instruct)

-0.683.918.513.09May 8, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.05
172712182526
2026.05
9165143734
2026.05
500036
2026.05
3407171521
2026.05
3346121120
2026.05
11032619
2026.05
100033
2026.05
000026
2026.05
000001
2026.05
01617615