Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Safety Alignment Evaluation on SorryBench

4.2Harmful Response Rate (%)

Staged-Competence

0.76823.93447.170.266May 25, 2026
Updated 7d ago

Evaluation Results

MethodLinks
2026.05
4.2
2026.05
8
2026.05
8.7
2026.05
8.9
2026.05
9.6
2026.05
16.2
2026.05
18.7
2026.05
19.8
2026.05
21.3
2026.05
24.2
2026.05
24.9
2026.05
25.3
2026.05
28
2026.05
29.6
2026.05
32.2
2026.05
72.7
2026.05
85.8
2026.05
90