| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Prompt Classification | OpenAI Moderation Text Prompt | F1 Score88.89 | 14 | |
| Unsafe content categorization | OpenAI Moderation | Accuracy88.35 | 9 | |
| Out-of-Taxonomy Risk Detection | OpenAI Moderation | Out-of-Taxonomy F167.92 | 4 | |
| OOD safety category inference (Stage 2) | OpenAI Moderation | Mean Reward36.45 | 4 |