Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Adversarial Robustness Evaluation on HarmBench
Loading...
1
Attack Success Rate (ASR)
SafeMoE-XL
-0.64
10.43
21.5
32.57
May 30, 2026
Attack Success Rate (ASR)
Mean Jaccard Similarity (MJS)
Updated 1d ago
Evaluation Results
Method
Method
Links
Attack Success Rate (ASR)
Mean Jaccard Similarity (MJS)
SafeMoE-XL
Architecture=XL
2026.05
1
1.1
SafeMoE-Qwen
Backbone=Qwen
2026.05
20
2.1
Qwen3-4B-Instruct
Fine-tuning=Instruct,...
2026.05
24
2.7
Mistral-SFT
Fine-tuning=SFT
2026.05
42
3.15
Feedback
Search any
task
Search any
task