| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AlpacaFarm (test) | Delim. | Attack Success Rate0 | 105 | 2mo ago | |
| Amazon Reviews | ObliInjection | ASR99.8 | 47 | 3mo ago | |
| HotpotQA | ObliInjection-CE | ASR100 | 42 | 3mo ago | |
| Multi-News | JudgeDeceiver | ASR100 | 42 | 3mo ago | |
| Agent Action Subset 2 | IR0 | 24 | 16d ago | ||
| LLM Behavior Subset 1 | Actor-Critic | IR99.8 | 24 | 16d ago | |
| InjectAgent | IterInject | Direct Harm ASR90.83 | 12 | 8d ago | |
| InjecAgent | Sanitizer | Base ASR0.3 | 12 | 2mo ago | |
| AgentDojo | AGENTSYS | Benign Utility64.36 | 12 | 3mo ago | |
| QA Agent | MINJA | ISR100 | 9 | 3mo ago | |
| LivePI Total | ASR10.7 | 5 | 15d ago | ||
| LivePI Gist (n=50) | ASR0 | 5 | 15d ago | ||
| LivePI Repo Links (n=4) | ASR50 | 5 | 15d ago | ||
| LivePI Local Docs (n=50) | ASR50 | 5 | 15d ago | ||
| LivePI Email (n=50) | ASR20 | 5 | 15d ago | ||
| LivePI Group chat (n=15) | ASR100 | 5 | 15d ago |