| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Response Harmfulness Classification | WildGuard (test) | F1 (Total)79.48 | 30 | |
| Safety Evaluation | Wildguard (test) | Wildguard Test Score0.08 | 27 | |
| Prompt Classification | WildGuard Text Prompt | F1 Score90.46 | 14 | |
| Refusal Detection | WILDGUARD (test) | F1 (Harmful)94 | 14 | |
| Text-based safety moderation | WildGuard | F1 Score78.6 | 12 | |
| Prompt Harmfulness Classification | WILDGUARD (test) | F1 (Total)88.9 | 12 | |
| Jailbreak Attack | WildGuard (test) | ASR82.64 | 8 | |
| Safety Moderation | WildGuard Prompt | F1 Score89.5 | 7 | |
| Audio Safety Moderation | WildGuard-TTS | F1 Score88.4 | 7 | |
| SCAV-Embedding Attack Defense | Wildguard (test) | ASR28.84 | 4 |