| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Safety evaluation | UnsafeBench | F1 Score89 | 24 | |
| Visual Compliance Verification | UnsafeBench | Unsafe F176 | 15 | |
| Binary Safety Classification | UnsafeBench | Sexual35.5 | 13 | |
| Safety Evaluation | UnsafeBench (test) | F1 Score81 | 11 | |
| Content Moderation | UnsafeBench Sexual category (test) | Accuracy81.4 | 8 | |
| Multimodal Content Moderation | UnsafeBench | Accuracy76.7 | 4 | |
| Multimodal Content Moderation | UnsafeBench Sexual Text-Only | Accuracy81.82 | 3 | |
| Multimodal Content Moderation | UnsafeBench Sexual Text+Visual | Accuracy81.08 | 3 |