| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Contextual vulnerability assessment | OpenClaw | Risk 1 AGS Score93 | 9 | |
| Long-term state poisoning evaluation | OpenClaw Injection Tool | Harm Score (HS)3.66 | 8 | |
| Prompt Injection | OpenClaw (140 adversarial instances) | Defense Success Rate90 | 7 | |
| Malicious command interception | OpenClaw 1.0 (test) | Defense Rate83 | 6 | |
| Long-term state poisoning evaluation | OpenClaw ZH | Harm Score (HS)3.82 | 4 | |
| Long-term state poisoning evaluation | OpenClaw EN | Harm Score (HS)3.76 | 4 | |
| Long-term state poisoning evaluation | OpenClaw Average across conversation variants | Harm Score (HS)4.35 | 4 | |
| Long-term state poisoning evaluation | OpenClaw Web Content | Harm Score (HS)4.03 | 4 | |
| Long-term state poisoning evaluation | OpenClaw Log Replay | Harm Score (HS)4.34 | 4 | |
| Long-term state poisoning evaluation | OpenClaw Routine | Harm Score (HS)3.9 | 4 | |
| Threat Detection | OpenClaw 140 adversarial instances | Defense Success Rate85 | 4 | |
| Credential Leakage | OpenClaw 140 adversarial instances | Defense Success Rate85 | 4 | |
| Remote Code Execution Attack Success Rate | OpenClaw | C-F Score65 | 3 | |
| Malicious Skill | OpenClaw (140 adversarial instances) | Defense Success Rate90 | 3 | |
| Configuration Modification | OpenClaw 140 adversarial instances | Defense Success Rate90 | 3 | |
| End-to-End Safety Blocking | OpenClaw | Parent Runs539 | 2 | |
| Dangerous Command | OpenClaw 140 adversarial instances | Defense Success Rate90 | 2 | |
| Autonomous Research Agent Capability Assessment | OpenCLAW-P2P | Metric- | 0 |