| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Utility Assessment | AgentDojo (test) | Utility100 | 128 | |
| Adversarial Attack Success Rate Assessment | AgentDojo | ASR0 | 56 | |
| Agent Task Performance | AgentDojo Travel | Attack Success Rate13.57 | 24 | |
| Prompt Injection Defense | AgentDojo New Attack 2 | Utility under Attack (UA)89.78 | 23 | |
| Prompt Injection Defense | AgentDojo New Attack 1 | Utility under Attack89.88 | 23 | |
| Prompt Injection Defense | AgentDojo Important Instructions | Utility under Attack0.9041 | 23 | |
| Prompt Injection Defense | AgentDojo No Attack | Benign Utility92.78 | 23 | |
| Agent Task Performance | AgentDojo Banking | Attack Success Rate62.5 | 18 | |
| Agent Planning | AgentDojo | TCR @ ∞80 | 16 | |
| Guarded Agent Evaluation | AgentDojo full latest | ASR0.5616 | 14 | |
| Indirect Prompt Injection | AgentDojo | Benign Utility64.36 | 12 | |
| LLM Agent Defense | AgentDojo Overall | Clean Utility84.54 | 12 | |
| LLM Agent Defense | AgentDojo Slack | Clean Utility80.95 | 12 | |
| LLM Agent Defense | AgentDojo Workspace | Clean Utility85 | 12 | |
| Prompt Injection Attack | AgentDojo Slack suite | Baseline ASR14.4 | 9 | |
| Prompt Injection Defense | AgentDojo | Benign Utility77.3 | 8 | |
| Prompt Injection Prevention | AgentDojo (test) | Banking Success Rate36 | 7 | |
| Tool-agent system evaluation | Agentdojo | Utility (No Attack)27.4 | 6 | |
| Agentic Task Execution | AgentDojo Total | BU78.5 | 6 | |
| Agentic Task Execution | AgentDojo Slack | BU86.7 | 6 | |
| Agentic Task Execution | AgentDojo Banking | BU91.2 | 6 | |
| Agentic Task Execution | AgentDojo Workspace | BU76.5 | 6 | |
| Agentic Prompt Injection Defense | AgentDojo (test) | ASR (Direct)0 | 6 | |
| Prompt Injection Defense | AgentDojo Overall v1 (test) | CU37.11 | 6 | |
| Prompt Injection Defense | AgentDojo Slack suite v1 (test) | CU61.9 | 6 |