| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Response Classification | Aegis Text Response 2.0 | F1 Score86.3 | 32 | |
| Prompt classification | Aegis 2.0 | F1 Score87.3 | 32 | |
| Prompt classification | Aegis | F1 Score89.6 | 32 | |
| Explainability classification | AEGIS 2.0 (test) | Unsafe F179.6 | 27 | |
| Input Moderation | AEGIS (test) | F1 Score91.4 | 26 | |
| Harmfulness Detection | Aegis | Macro F189.78 | 25 | |
| Safety classification | AEGIS 2.0 (test) | AUC94 | 24 | |
| Input Moderation | AEGIS 2.0 (test) | F1 Score87.9 | 22 | |
| Content Moderation | Aegis 2.0 | F1 Score86.1 | 21 | |
| Safety Moderation | Aegis 2.0 | Prompt F186.4 | 15 | |
| Prompt Classification | Aegis Text Prompt 2.0 | F1 Score83.52 | 14 | |
| Text-based safety moderation | Aegis | F1 Score84 | 12 | |
| Self-Harm Risk Screening | AEGIS (N=161) 2.0 | Escalation Rate (Esc.)100 | 10 | |
| Unsafe content categorization | Aegis 2.0 | Accuracy63.18 | 9 | |
| Harmfulness Detection | Aegis 2.0 | Macro F183.4 | 8 | |
| Safety Moderation | Aegis Response 2.0 | F1 Score87.6 | 7 | |
| Safety Moderation | Aegis Prompt 2.0 | F1 Score87.9 | 7 | |
| Safety Moderation | Aegis Prompt 1.0 | F1 Score91.9 | 7 | |
| Multi-label Safety Categorization | aegis categories | Macro Accuracy62.84 | 4 | |
| Out-of-Taxonomy Risk Detection | Aegis 2.0 | F1 Score63.62 | 4 | |
| OOD safety category inference (Stage 2) | Aegis 2.0 | Reward Mean23.08 | 4 | |
| Content Moderation | Aegis In-Distribution | Pornography Score75 | 2 |