Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Content Moderation on Safety Evaluation Set Moderation (held-out target labels)

0.89AUROC

Safety Monitor (In-model)

0.496880.598940.7010.80306May 12, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
0.89
2026.05
0.867
2026.05
0.857
2026.05
0.836
2026.05
0.82
0.512