| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Utility Assessment | AgentDojo (test) | Utility100 | 128 | |
| Adversarial Attack Success Rate Assessment | AgentDojo | ASR0 | 88 | |
| Utility Evaluation | AgentDojo | Utility78.4 | 32 | |
| Indirect Prompt Injection Defense Evaluation | AgentDojo TOOLKNOWLEDGE attack suite | Latency (s)6.66 | 24 | |
| Agent Task Performance | AgentDojo Travel | Attack Success Rate13.57 | 24 | |
| Prompt Injection Defense | AgentDojo New Attack 2 | Utility under Attack (UA)89.78 | 23 | |
| Prompt Injection Defense | AgentDojo New Attack 1 | Utility under Attack89.88 | 23 | |
| Prompt Injection Defense | AgentDojo Important Instructions | Utility under Attack0.9041 | 23 | |
| Prompt Injection Defense | AgentDojo No Attack | Benign Utility92.78 | 23 | |
| Adversarial Robustness against Indirect Prompt Injection | AgentDojo Average across attacks | UA13.18 | 22 | |
| Adversarial Robustness against Indirect Prompt Injection | AgentDojo ToolKnowledge | Utility Score59.64 | 22 | |
| Adversarial Robustness against Indirect Prompt Injection | AgentDojo ImportantMsgs | Utility (UA)59.3 | 22 | |
| Adversarial Robustness against Indirect Prompt Injection | AgentDojo Combined | UA73.58 | 22 | |
| Adversarial Robustness against Indirect Prompt Injection | AgentDojo IgnorePrevious | Utility (UA)73.92 | 22 | |
| LLM Agent Task Completion | AgentDojo No Attack | Benign Utility73.91 | 22 | |
| Prompt Injection Attack | AgentDojo | ASR@164 | 21 | |
| Agent Task Performance | AgentDojo Banking | Attack Success Rate62.5 | 18 | |
| Agent Planning | AgentDojo | TCR @ ∞80 | 16 | |
| Guarded Agent Evaluation | AgentDojo full latest | ASR0.5616 | 14 | |
| Prompt Injection Defense | AgentDojo | Benign Utility77.3 | 13 | |
| Agent defense evaluation | AgentDojo | Utility under Attack69.15 | 12 | |
| Indirect Prompt Injection | AgentDojo | Benign Utility64.36 | 12 | |
| LLM Agent Defense | AgentDojo Overall | Clean Utility84.54 | 12 | |
| LLM Agent Defense | AgentDojo Slack | Clean Utility80.95 | 12 | |
| LLM Agent Defense | AgentDojo Workspace | Clean Utility85 | 12 |