Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Safety Alignment on Refusal Evaluation Dataset

99Refusal Rate

Base

-3.9622.7749.576.23Jan 22, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
99-
2026.01
99-
2026.01
98-
2026.01
98-
2026.01
97-
2026.01
95-
2026.01
94-
2026.01
92-
2026.01
140.81
2026.01
80.88
2026.01
50.92
2026.01
20.96
2026.01
10.98
2026.01
10.98
2026.01
00.99
2026.01
00.99