| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Inj-TriviaQA | StruQ | Naive ASR0.11 | 21 | 9d ago | |
| TrojanTools | ASR47.1 | 18 | 1mo ago | ||
| Combined Attacks | LLM-Detector | ASR0 | 18 | 1mo ago | |
| Ignore Instruction | LLM-Detector | ASR0 | 18 | 1mo ago | |
| Vision-Language Agentic IPI Benchmark (test) | BU72 | 12 | 1mo ago | ||
| Video Modality (test) | ARGUS | UIAinject38.5 | 10 | 1mo ago | |
| Image Modality (test) | Ignore Prompt | UIAinject24.5 | 10 | 1mo ago | |
| Audio Modality (test) | System Prompt | UIAinject7.5 | 9 | 1mo ago | |
| AgentDojo Banking and Slack suites | BU (No Attack)35.29 | 4 | 1mo ago | ||
| DoomArena | BU73.57 | 4 | 1mo ago | ||
| AgentDojo (Full) | MetaSecAlign | BU (NOATTACK)78.26 | 3 | 1mo ago | |
| AgentDojo | Banking Defense Rate51.39 | 3 | 1mo ago | ||
| AgentDojo Out-of-Distribution (OOD) | ADR0 | 2 | 1mo ago | ||
| InjectAgent Out-of-Distribution (OOD) | ADR0 | 2 | 1mo ago | ||
| Important Messages DeepSeek V3.2 (test) | ASR42.68 | 1 | 1mo ago | ||
| Important Messages DeepSeek V3.1 (test) | MELON | ASR63 | 1 | 1mo ago |