Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Alignment Evaluation on LATharm
Loading...
54
Utility
OpenAI Moderation
33.2
38.6
44
49.4
May 13, 2026
Utility
ASR
Harm Score (HS)
Updated 19d ago
Evaluation Results
Method
Method
Links
Utility
ASR
Harm Score (HS)
OpenAI Moderation
Setting=fine-tuning wi...
2026.05
54
75
3.86
No defense
Fine-tuning=on combine...
2026.05
53
98
4.96
Backdoor
Setting=fine-tuning wi...
2026.05
53
84
4.44
GradShield
Setting=fine-tuning wi...
2026.05
53
1
1.04
Llamaguard
Setting=fine-tuning wi...
2026.05
52
7
1.21
SafeInstr
Setting=fine-tuning wi...
2026.05
52
93
4.85
Safe Lora
Setting=fine-tuning wi...
2026.05
52
99
4.97
SEAL
Setting=fine-tuning wi...
2026.05
52
98
4.97
Base
Fine-tuning=none (orig...
2026.05
34
4
1.16
Feedback
Search any
task
Search any
task