Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Alignment Evaluation on HEx-PHI (Harmful Response Rate)
Loading...
0.7
Harmful Response Rate
Sequential
-2.94
21.63
46.2
70.77
May 25, 2026
Harmful Response Rate
Updated 7d ago
Evaluation Results
Method
Method
Links
Harmful Response Rate
Sequential
Backbone=Yi-1.5-9B
2026.05
0.7
Staged-Competence
Backbone=Yi-1.5-9B
2026.05
0.7
Sqrt-Competence
Backbone=Yi-1.5-9B
2026.05
1.7
Staged-Competence
Backbone=Qwen3-8B
2026.05
2.7
Curri-DPO
Backbone=Yi-1.5-9B
2026.05
3
Standard DPO (Baseline)
Backbone=Yi-1.5-9B
2026.05
8.7
Staged-Competence
Backbone=LLaMA-3-8B
2026.05
10
Sequential
Backbone=LLaMA-3-8B
2026.05
12.3
Sqrt-Competence
Backbone=LLaMA-3-8B
2026.05
14
Curri-DPO
Backbone=LLaMA-3-8B
2026.05
18
Curri-DPO
Backbone=Qwen3-8B
2026.05
22.7
Standard DPO (Baseline)
Backbone=LLaMA-3-8B
2026.05
24
Sequential
Backbone=Qwen3-8B
2026.05
24.3
Sqrt-Competence
Backbone=Qwen3-8B
2026.05
29
Standard DPO (Baseline)
Backbone=Qwen3-8B
2026.05
30.7
Unaligned
Backbone=Yi-1.5-9B
2026.05
72.7
Unaligned
Backbone=Qwen3-8B
2026.05
86
Unaligned
Backbone=LLaMA-3-8B
2026.05
91.7
Feedback
Search any
task
Search any
task