Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ToxicChat

Benchmarks

Task NameDataset NameSOTA ResultTrend
Unsafe Prompt DetectionToxicChat (test)
Precision0.815
16
Prompt ClassificationToxicChat Text Prompt
F1 Score96.27
14
Safety ClassificationToxicChat
F1 Score0.81
14
Safety ClassificationToxicChat (out-of-distribution)
F1 Score72.88
11
Prompt-only Safety RoutingToxicChat
Routing F156.82
10
Toxicity DetectionToxicChat
F1 Score1
9
Toxicity DetectionToxicChat (test)
Accuracy0.9772
4
Unsafe prompt detectionToxicChat
AUPRC75.5
4
Safety ClassificationToxicChat (in-distribution)
F1 Score (%)82.2
2
Showing 9 of 9 rows