| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Explainability classification | AEGIS 2.0 (test) | Unsafe F179.6 | 27 | |
| Response Classification | Aegis Text Response 2.0 | F1 Score82.27 | 16 | |
| Prompt classification | Aegis 2.0 | F1 Score87.3 | 16 | |
| Prompt classification | Aegis | F1 Score89.6 | 16 | |
| Prompt Classification | Aegis Text Prompt 2.0 | F1 Score83.52 | 14 | |
| Text-based safety moderation | Aegis | F1 Score84 | 12 | |
| Content Moderation | Aegis 2.0 | F1 Score80 | 10 | |
| Unsafe content categorization | Aegis 2.0 | Accuracy63.18 | 9 | |
| Safety Moderation | Aegis Response 2.0 | F1 Score87.6 | 7 | |
| Safety Moderation | Aegis Prompt 2.0 | F1 Score87.9 | 7 | |
| Safety Moderation | Aegis Prompt 1.0 | F1 Score91.9 | 7 | |
| Out-of-Taxonomy Risk Detection | Aegis 2.0 | F1 Score63.62 | 4 | |
| OOD safety category inference (Stage 2) | Aegis 2.0 | Reward Mean23.08 | 4 |