| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Utility Assessment | AgentDojo (test) | Utility100 | 128 | |
| Adversarial Attack Success Rate Assessment | AgentDojo | ASR0 | 88 | |
| Utility Evaluation | AgentDojo | Utility78.4 | 32 | |
| Indirect Prompt Injection Defense Evaluation | AgentDojo TOOLKNOWLEDGE attack suite | Latency (s)6.66 | 24 | |
| Agent Task Performance | AgentDojo Travel | Attack Success Rate13.57 | 24 | |
| Prompt Injection Defense | AgentDojo New Attack 2 | Utility under Attack (UA)89.78 | 23 | |
| Prompt Injection Defense | AgentDojo New Attack 1 | Utility under Attack89.88 | 23 | |
| Prompt Injection Defense | AgentDojo Important Instructions | Utility under Attack0.9041 | 23 | |
| Prompt Injection Defense | AgentDojo No Attack | Benign Utility92.78 | 23 | |
| Agentic Security and Utility Evaluation | AgentDojo | ASR0 | 22 | |
| Adversarial Robustness against Indirect Prompt Injection | AgentDojo Average across attacks | UA13.18 | 22 | |
| Adversarial Robustness against Indirect Prompt Injection | AgentDojo ToolKnowledge | Utility Score59.64 | 22 | |
| Adversarial Robustness against Indirect Prompt Injection | AgentDojo ImportantMsgs | Utility (UA)59.3 | 22 | |
| Adversarial Robustness against Indirect Prompt Injection | AgentDojo Combined | UA73.58 | 22 | |
| Adversarial Robustness against Indirect Prompt Injection | AgentDojo IgnorePrevious | Utility (UA)73.92 | 22 | |
| LLM Agent Task Completion | AgentDojo No Attack | Benign Utility73.91 | 22 | |
| Prompt Injection Attack | AgentDojo | ASR@164 | 21 | |
| Agentic Security Evaluation | AgentDojo v1 (97 benign tasks, 27 injection tasks) | Utility Score60.8 | 20 | |
| Agent Task Performance | AgentDojo Banking | Attack Success Rate62.5 | 18 | |
| Agent Planning | AgentDojo | TCR @ ∞80 | 16 | |
| Agent behavioral safety | AgentDojo | Safety Rate97.1 | 14 | |
| Guarded Agent Evaluation | AgentDojo full latest | ASR0.5616 | 14 | |
| Prompt Injection Defense | AgentDojo | Benign Utility77.3 | 13 | |
| Agent Security (Indirect Prompt Injection) | AgentDojo Overall (test) | ASR11.6 | 12 | |
| Agent Security (Indirect Prompt Injection) | AgentDojo Slack (test) | ASR13.9 | 12 |