Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Safety Defense on Unsafe Prompts 7 categories

100Hate Rate

DTVI

35.249652.059868.8785.6802Mar 23, 2026
Updated 25d ago

Evaluation Results

MethodLinks
2026.03
10077.7889.3382.4694.7482.7692.8688.56
2026.03
79.2568.577582.1484.5164.2981.4876.46
2026.03
41.188.572431.58820.6914.8121.26
2026.03
37.74-2.863.951.75-12.33-3.4515.385.74