Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety-constrained Fine-tuning on AGNews
Loading...
92
Utility
No defense
74.32
78.91
83.5
88.09
May 13, 2026
Utility
ASR
Harm Score (HS)
Updated 19d ago
Evaluation Results
Method
Method
Links
Utility
ASR
Harm Score (HS)
No defense
Defense=No defense
2026.05
92
99
4.98
SafeInstr
Defense=SafeInstr
2026.05
92
93
4.83
GradShield
Defense=GradShield
2026.05
91
5
1.19
Moderation
Defense=Moderation
2026.05
90
87
4.45
SafeLoRA
Defense=SafeLoRA
2026.05
90
94
4.94
Backdoor
Defense=Backdoor
2026.05
88
77
4.15
base
Defense=base
2026.05
75
4
1.16
Feedback
Search any
task
Search any
task