Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ToxicChat

Benchmarks

Task NameDataset NameSOTA ResultTrend
Input ModerationToxicChat (test)
F1 Score82.8
22
Unsafe Prompt DetectionToxicChat (test)
Precision0.815
16
Safety RefusalToxicChat
Refusal Rate95
15
Prompt ClassificationToxicChat Text Prompt
F1 Score96.27
14
Safety ClassificationToxicChat
F1 Score0.81
14
Safety ClassificationToxicChat (out-of-distribution)
F1 Score72.88
11
Prompt-only Safety RoutingToxicChat
Routing F156.82
10
Toxicity DetectionToxicChat
F1 Score1
9
Content Safety ClassificationToxicChat
Precision75.46
6
Toxicity DetectionToxicChat (test)
Accuracy0.9772
4
Unsafe prompt detectionToxicChat
AUPRC75.5
4
Safety DetectionToxicChat
F1 Score65
3
Safety ClassificationToxicChat (in-distribution)
F1 Score (%)82.2
2
Showing 13 of 13 rows