| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| InjecAgent | Direct | ASR @ 1 Attempt0 | 32 | 1mo ago | |
| Direct Scenario | StruQ | ASR2.88 | 28 | 9d ago | |
| SQuAD v2 | PAIR | ASR0 | 27 | 9d ago | |
| AgentDojo | PISmith | ASR@164 | 21 | 1mo ago | |
| Tool-Completion (TCA) | CAHL | ASR0.12 | 14 | 1mo ago | |
| NavGPT (test) | Navigation Error7.07 | 12 | 1mo ago | ||
| AgentDojo Slack suite | AgentDojo Static Injection | Baseline ASR14.4 | 9 | 1mo ago | |
| Tool-Completion Naive-e | CAHL | ASR15 | 7 | 1mo ago | |
| Tool-Completion TCA-e | CAHL | ASR56 | 7 | 1mo ago | |
| AgentDojo 13 non-agent benchmarks | TAP | Training Queries0 | 6 | 1mo ago | |
| Outdoor Navigation (test) | NE0 | 6 | 1mo ago | ||
| Real-world Overall (test) | PI3D | ASR64.8 | 2 | 1mo ago | |
| Real-world Outdoor (test) | PI3D | ASR58.3 | 2 | 1mo ago | |
| Office Real-world (test) | PI3D | ASR54.2 | 2 | 1mo ago | |
| Real-world Home (test) | PI3D | ASR88.3 | 2 | 1mo ago |