| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Agent Behavioral Safety and Helpfulness Evaluation | ToolEmu | Safety Rate97.9 | 42 | |
| Agent Safety Evaluation | ToolEmu | Safety99 | 36 | |
| Policy-refinement | ToolEmu | IOR33.1 | 16 | |
| Graph-based Agent Memory Poisoning | ToolEmu | Utilization (Util.)97 | 5 | |
| Propagation Detection | ToolEmu filtered 79-case injection-like | Precision62.3 | 2 |