| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Input Moderation | ToxicChat (test) | F1 Score82.8 | 22 | |
| Unsafe Prompt Detection | ToxicChat (test) | Precision0.815 | 16 | |
| Safety Refusal | ToxicChat | Refusal Rate95 | 15 | |
| Prompt Classification | ToxicChat Text Prompt | F1 Score96.27 | 14 | |
| Safety Classification | ToxicChat | F1 Score0.81 | 14 | |
| Safety Classification | ToxicChat (out-of-distribution) | F1 Score72.88 | 11 | |
| Prompt-only Safety Routing | ToxicChat | Routing F156.82 | 10 | |
| Toxicity Detection | ToxicChat | F1 Score1 | 9 | |
| Content Safety Classification | ToxicChat | Precision75.46 | 6 | |
| Toxicity Detection | ToxicChat (test) | Accuracy0.9772 | 4 | |
| Unsafe prompt detection | ToxicChat | AUPRC75.5 | 4 | |
| Safety Detection | ToxicChat | F1 Score65 | 3 | |
| Safety Classification | ToxicChat (in-distribution) | F1 Score (%)82.2 | 2 |