Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

OpenAI Moderation

Benchmarks

Task NameDataset NameSOTA ResultTrend
Prompt ClassificationOpenAI Moderation Text Prompt
F1 Score88.89
14
Unsafe content categorizationOpenAI Moderation
Accuracy88.35
9
Out-of-Taxonomy Risk DetectionOpenAI Moderation
Out-of-Taxonomy F167.92
4
OOD safety category inference (Stage 2)OpenAI Moderation
Mean Reward36.45
4
Showing 4 of 4 rows