| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Response Harmfulness Classification | WildGuard (test) | F1 (Total)79.48 | 30 | |
| Safety Evaluation | Wildguard (test) | Wildguard Test Score0.08 | 27 | |
| Input Moderation | WildGuard (test) | F1 Score89.5 | 22 | |
| Prompt Classification | WildGuard Text Prompt | F1 Score90.46 | 14 | |
| Refusal Detection | WILDGUARD (test) | F1 (Harmful)94 | 14 | |
| Text-based safety moderation | WildGuard | F1 Score78.6 | 12 | |
| Prompt Harmfulness Classification | WILDGUARD (test) | F1 (Total)88.9 | 12 | |
| Output Moderation | WildGuard (test) | F1 Score79.5 | 11 | |
| Jailbreak Attack | WildGuard (test) | ASR82.64 | 8 | |
| Refusal Evaluation | WildGuard Harmful | Refusal Rate84.35 | 7 | |
| Over-refusal Evaluation | WildGuard Unharmful | Over-refusal Rate1.06 | 7 | |
| Safety Moderation | WildGuard Prompt | F1 Score89.5 | 7 | |
| Audio Safety Moderation | WildGuard-TTS | F1 Score88.4 | 7 | |
| SCAV-Embedding Attack Defense | Wildguard (test) | ASR28.84 | 4 |