| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Malicious behavior measurement | AgentHarm Harmful | Harm Rate0 | 33 | |
| Agent Safety Evaluation | AgentHarm Libra | Score83 | 27 | |
| Agent Safety Evaluation | AgentHarm Benign Requests | Safety Score79 | 27 | |
| Agent Safety Evaluation | AgentHarm Harmful Requests | Score59 | 27 | |
| LLM Agent Utility | AgentHarm Benign Requests | Utility Score69.1 | 23 | |
| Illicit task completion | AgentHarm English prompts | AgentHarm Score (AHS)72.7 | 20 | |
| Step-level tool invocation safety detection | AgentHarm Traj | Accuracy84.81 | 20 | |
| Jailbreak Attack | AgentHarm | Attack Success Score (ASS)2 | 18 | |
| Agent behavioral safety | AgentHarm | Safety Rate90.6 | 14 | |
| Guarded Agent Evaluation | AgentHarm latest (full) | Refusal Rate97.16 | 14 | |
| Agent Safety Evaluation | AgentHarm (held-out) | HCR12.5 | 10 | |
| Benign tool-calling reliability | AgentHarm Benign | Refusal Rate0 | 10 | |
| Agent Harm Evaluation | AgentHarm public | HarmScore9.6 | 8 | |
| Toxicity and Harmful Content Detection | AgentHarm | Score94.69 | 5 | |
| Agent Perturbation Reliability Testing | AgentHarm (Agent Perturbation Reliability Tests) | Accuracy90.6 | 5 | |
| Safety | AgentHarm | Harm Score53.8 | 4 | |
| Safety Evaluation | AgentHarm | Cost per Accuracy Point ($)0.0001 | 4 | |
| Safety classification | AgentHarm (val) | Safety Score100 | 2 |