| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Unsafe Prompt Detection | ToxicChat (test) | Precision0.815 | 16 | |
| Prompt Classification | ToxicChat Text Prompt | F1 Score96.27 | 14 | |
| Safety Classification | ToxicChat | F1 Score0.81 | 14 | |
| Safety Classification | ToxicChat (out-of-distribution) | F1 Score72.88 | 11 | |
| Prompt-only Safety Routing | ToxicChat | Routing F156.82 | 10 | |
| Toxicity Detection | ToxicChat | F1 Score1 | 9 | |
| Toxicity Detection | ToxicChat (test) | Accuracy0.9772 | 4 | |
| Unsafe prompt detection | ToxicChat | AUPRC75.5 | 4 | |
| Safety Classification | ToxicChat (in-distribution) | F1 Score (%)82.2 | 2 |