| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TrojanTools | ASR47.1 | 18 | 4d ago | ||
| Combined Attacks | LLM-Detector | ASR0 | 18 | 4d ago | |
| Ignore Instruction | LLM-Detector | ASR0 | 18 | 4d ago | |
| Vision-Language Agentic IPI Benchmark (test) | BU72 | 12 | 4d ago | ||
| Video Modality (test) | ARGUS | UIAinject38.5 | 10 | 4d ago | |
| Image Modality (test) | Ignore Prompt | UIAinject24.5 | 10 | 4d ago | |
| Audio Modality (test) | System Prompt | UIAinject7.5 | 9 | 4d ago | |
| DoomArena | BU73.57 | 4 | 4d ago | ||
| AgentDojo | Banking Defense Rate51.39 | 3 | 4d ago | ||
| AgentDojo Out-of-Distribution (OOD) | ADR0 | 2 | 4d ago | ||
| InjectAgent Out-of-Distribution (OOD) | ADR0 | 2 | 4d ago | ||
| Important Messages DeepSeek V3.2 (test) | ASR42.68 | 1 | 4d ago | ||
| Important Messages DeepSeek V3.1 (test) | MELON | ASR63 | 1 | 4d ago |