Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Toxicity and Harmful Content Detection on AgentHarm

94.69Score

JT-Safe-V2-35B

83.811686.635889.4692.2842May 23, 2026
Updated 8d ago

Evaluation Results

MethodLinks
2026.05
94.69
2026.05
94.62
2026.05
92.77
90
2026.05
84.23