| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Safety Detection | AgentHazard Strongest | Accuracy90.87 | 56 | |
| Agentic Safety Moderation | AgentHazard | ASR109 | 5 | |
| Dataset Diversity and Coverage Evaluation | AgentHazard full | Goal-text Entropy0.93 | 1 | |
| Dataset Diversity and Coverage Evaluation | AgentHazard 3-app | Goal-text Entropy0.976 | 1 |