Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Adversarial Robustness Evaluation on PKU-Safe
Loading...
25
Attack Success Rate (ASR)
SafeMoE-XL
23.52
33.51
43.5
53.49
May 30, 2026
Attack Success Rate (ASR)
Mean Jitter Score (MJS)
Updated 1d ago
Evaluation Results
Method
Method
Links
Attack Success Rate (ASR)
Mean Jitter Score (MJS)
SafeMoE-XL
Architecture=XL
2026.05
25
2.78
SafeMoE-Qwen
Backbone=Qwen
2026.05
32
3.1
Qwen3-4B-Instruct
Fine-tuning=Instruct,...
2026.05
48
3.6
Mistral-SFT
Fine-tuning=SFT
2026.05
62
5.6
Feedback
Search any
task
Search any
task