| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Agentic Oversight | Agent-SafetyBench | Detection Accuracy84.06 | 42 | |
| Agent Safety Evaluation | Agent-SafetyBench aggregated clean and five attack types | UBR26.31 | 30 | |
| Agent Safety Evaluation | Agent-SafetyBench | Agent-SafetyBench Score72.3 | 8 |