Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Toxicity Steering on 15 prefix prompts length 50

71.2Toxicity Accuracy

ILRR

-2.01616.9923655.008Jan 29, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.01
71.2
2026.01
60.7
2026.01
15.7
2026.01
9
2026.01
8.3
2026.01
7.2
2026.01
3.8
2026.01
2.4
2026.01
1.9
2026.01
1.4
2026.01
0.8