Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Harmful Request Defense on SORRY-Bench

13ASR

Self-Guard

11.36422.40733.4544.493Jan 31, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
13-
2026.01
14.3-
2026.01
15.7-
2026.01
16.1-
2026.01
16.6-
2026.01
18.2-
2026.01
18.4-
2026.01
18.9-
2026.01
18.9-
2026.01
20.7-
2026.01
22.5-
2026.01
22.7-
2026.01
22.7-
2026.01
22.7-
2026.01
27.2-
2026.01
43-
2026.01
45.9-
2026.01
47.5-
2026.01
50.9-
2026.01
52.3-
2026.01
52.7-
2026.01
53-
2026.01
53.2-
2026.01
53.9-