| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Illicit task completion | AgentHarm English prompts | AgentHarm Score (AHS)72.7 | 20 | |
| Step-level tool invocation safety detection | AgentHarm Traj | Accuracy84.81 | 20 | |
| Guarded Agent Evaluation | AgentHarm latest (full) | Refusal Rate97.16 | 14 | |
| Benign tool-calling reliability | AgentHarm Benign | Refusal Rate0 | 10 | |
| Malicious behavior measurement | AgentHarm Harmful | Harm Rate6 | 10 | |
| Agent Harm Evaluation | AgentHarm public | HarmScore9.6 | 8 | |
| Agent Perturbation Reliability Testing | AgentHarm (Agent Perturbation Reliability Tests) | Accuracy90.6 | 5 | |
| Safety Evaluation | AgentHarm | Cost per Accuracy Point ($)0.0001 | 4 |