| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| LLM Safety | Safety Evaluation Set | Harmful Response Rate1.66 | 25 | |
| Content Moderation | Safety Evaluation Set Moderation (held-out target labels) | AUROC0.89 | 6 | |
| Sentiment Analysis | Safety Evaluation Set Sentiment (held-out target labels) | AUROC97.5 | 6 | |
| Jailbreaking Detection | Safety Evaluation Set Jailbreaking (held-out target labels) | AUROC97.4 | 6 | |
| Toxicity Detection | Safety Evaluation Set Toxicity (held-out target labels) | AUROC97.6 | 6 |