| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Benign completion reliability | Agent Security Bench Benign | Completion Reliability99 | 10 | |
| Indirect Prompt Injection robustness | Agent Security Bench IPI | Attack Success Rate (ASR)2 | 10 | |
| Direct Prompt Injection robustness | Agent Security Bench DPI | ASR19 | 10 | |
| LLM Agent Security Evaluation | Agent Security Bench (test) | Benign Utility (BU)73.67 | 5 |