Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Finetuning with implicit harmful data on Identity-shift
Loading...
53
Utility
Safe Lora
33.24
38.37
43.5
48.63
May 13, 2026
Utility
ASR
HS
Updated 19d ago
Evaluation Results
Method
Method
Links
Utility
ASR
HS
Safe Lora
Fine-tuning=on combine...
2026.05
53
62
3.27
OpenAI Moderation
Fine-tuning=on combine...
2026.05
52
29
1.92
Llamaguard
Fine-tuning=on combine...
2026.05
52
53
2.98
No defense
Fine-tuning=on combine...
2026.05
51
75
3.75
SafeInstr
Fine-tuning=on combine...
2026.05
51
8
1.24
Backdoor
Fine-tuning=on combine...
2026.05
51
11
1.37
GradShield
Fine-tuning=on combine...
2026.05
51
1
1.01
Base
Fine-tuning=none
2026.05
34
0
1
Feedback
Search any
task
Search any
task