| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Harmfulness Detection | WildGuard | Macro F1 Score90.6 | 47 | |
| Response Harmfulness Classification | WildGuard (test) | F1 (Total)79.48 | 30 | |
| Safety Evaluation | Wildguard (test) | Wildguard Test Score0.08 | 27 | |
| Value Alignment | WildGuard (test) | Score29.69 | 24 | |
| Input Moderation | WildGuard (test) | F1 Score89.5 | 22 | |
| Prompt Harmfulness Classification | WILDGUARD (test) | F1 Score89.44 | 18 | |
| Safety classification | WildGuard (test) | F1 Score88.5 | 17 | |
| Safety Evaluation | WildGuard | WildGuard Refusal Rate25.5 | 16 | |
| Prompt Classification | WildGuard Text Prompt | F1 Score90.46 | 14 | |
| Refusal Detection | WILDGUARD (test) | F1 (Harmful)94 | 14 | |
| Text-based safety moderation | WildGuard | F1 Score78.6 | 12 | |
| Binary safety classification | wildguard prompt safety | Macro F197.91 | 11 | |
| Output Moderation | WildGuard (test) | F1 Score79.5 | 11 | |
| Stealthiness Evaluation | WildGuard 7B | Mean Perplexity2.33 | 10 | |
| Streaming Safety Detection | WildGuard (test) | Det@183.45 | 8 | |
| Jailbreak Attack | WildGuard (test) | ASR82.64 | 8 | |
| Violation Detection | WildGuard (test) | Safety F189.17 | 7 | |
| Refusal Evaluation | WildGuard Harmful | Refusal Rate84.35 | 7 | |
| Over-refusal Evaluation | WildGuard Unharmful | Over-refusal Rate1.06 | 7 | |
| Safety Moderation | WildGuard Prompt | F1 Score89.5 | 7 | |
| Audio Safety Moderation | WildGuard-TTS | F1 Score88.4 | 7 | |
| Multi-label Safety Categorization | wildguard prompt subcategory | Macro Accuracy83.35 | 4 | |
| SCAV-Embedding Attack Defense | Wildguard (test) | ASR28.84 | 4 | |
| Safety Classification | WildGuard overall (test) | Accuracy85.6 | 2 | |
| Jailbreaking | WildGuard 7B | Bypass Rate99.81 | 1 |