| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Unsafe content categorization | ProGuard Text | Accuracy76.96 | 9 | |
| Unsafe content categorization | ProGuard Text-Image | Accuracy0.6997 | 6 | |
| Unsafe content categorization | ProGuard Image | Accuracy76.02 | 5 | |
| OOD safety category inference (Stage 2) | ProGuard Text-Image | Mean Reward26.86 | 4 | |
| Out-of-Taxonomy Risk Detection | ProGuard Image | F1 Score57.59 | 4 | |
| Out-of-Taxonomy Risk Detection | ProGuard Text-Image | F1 Score (%)60.25 | 4 | |
| Out-of-Taxonomy Risk Detection | ProGuard Text | F1 Score56.94 | 4 | |
| OOD safety category inference (Stage 2) | ProGuard Image | Mean Reward25.95 | 4 |