| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Tool-Completion (TCA) | CAHL | ASR0.12 | 14 | 4d ago | |
| NavGPT (test) | Navigation Error7.07 | 12 | 4d ago | ||
| AgentDojo Slack suite | AgentDojo Static Injection | Baseline ASR14.4 | 9 | 4d ago | |
| Tool-Completion Naive-e | CAHL | ASR15 | 7 | 4d ago | |
| Tool-Completion TCA-e | CAHL | ASR56 | 7 | 4d ago | |
| Outdoor Navigation (test) | NE0 | 6 | 4d ago | ||
| Real-world Overall (test) | PI3D | ASR64.8 | 2 | 4d ago | |
| Real-world Outdoor (test) | PI3D | ASR58.3 | 2 | 4d ago | |
| Office Real-world (test) | PI3D | ASR54.2 | 2 | 4d ago | |
| Real-world Home (test) | PI3D | ASR88.3 | 2 | 4d ago |