Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Adversarial Robustness Evaluation on HarmBench

1Attack Success Rate (ASR)

SafeMoE-XL

-0.6410.4321.532.57May 30, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.05
11.1
2026.05
202.1
2026.05
242.7
2026.05
423.15