Share your thoughts, 1 month free Claude Pro on usSee more

Safety Alignment Evaluation on LATharm

54Utility

OpenAI Moderation

Updated 2mo ago

Evaluation Results

Method	Links
OpenAI Moderation 2026.05		54	75	3.86
No defense 2026.05		53	98	4.96
Backdoor 2026.05		53	84	4.44
GradShield 2026.05		53	1	1.04
Llamaguard 2026.05		52	7	1.21
SafeInstr 2026.05		52	93	4.85
Safe Lora 2026.05		52	99	4.97
SEAL 2026.05		52	98	4.97
Base 2026.05		34	4	1.16