Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Safety Alignment Evaluation on HEx-PHI (Harmful Response Rate)

0.7Harmful Response Rate

Sequential

-2.9421.6346.270.77May 25, 2026
Updated 7d ago

Evaluation Results

MethodLinks
2026.05
0.7
2026.05
0.7
2026.05
1.7
2026.05
2.7
2026.05
3
2026.05
8.7
2026.05
10
2026.05
12.3
2026.05
14
2026.05
18
2026.05
22.7
2026.05
24
2026.05
24.3
2026.05
29
2026.05
30.7
2026.05
72.7
2026.05
86
2026.05
91.7