Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety-constrained Fine-tuning on GSM8k
Loading...
87
Utility
GradShield
80.76
82.38
84
85.62
May 13, 2026
Utility
ASR
Harmful Score (HS)
Updated 19d ago
Evaluation Results
Method
Method
Links
Utility
ASR
Harmful Score (HS)
GradShield
Defense=GradShield
2026.05
87
0
1
base
Defense=base
2026.05
86
4
1.16
Moderation
Defense=Moderation
2026.05
85
52
2.4
Backdoor
Defense=Backdoor
2026.05
84
78
4.22
No defense
Defense=No defense
2026.05
83
97
5
SafeInstr
Defense=SafeInstr
2026.05
83
96
4.98
SafeLoRA
Defense=SafeLoRA
2026.05
81
98
4.99
Feedback
Search any
task
Search any
task