Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Alignment Evaluation on RTA
Loading...
53
Utility
No defense
33.24
38.37
43.5
48.63
May 13, 2026
Utility
ASR
HS
Updated 19d ago
Evaluation Results
Method
Method
Links
Utility
ASR
HS
No defense
Fine-tuning=on combine...
2026.05
53
16
1.31
SEAL
Setting=fine-tuning wi...
2026.05
53
15
1.24
GradShield
Setting=fine-tuning wi...
2026.05
53
6
1.2
Llamaguard
Setting=fine-tuning wi...
2026.05
52
11
1.12
SafeInstr
Setting=fine-tuning wi...
2026.05
52
0
1
Backdoor
Setting=fine-tuning wi...
2026.05
52
2
1.12
Safe Lora
Setting=fine-tuning wi...
2026.05
52
18
1.37
OpenAI Moderation
Setting=fine-tuning wi...
2026.05
51
21
1.33
Base
Fine-tuning=none (orig...
2026.05
34
4
1.16
Feedback
Search any
task
Search any
task