| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| CyberSecEval 2 | GPT-5-Mini-R | Robustness91 | 2 | 1mo ago | |
| Internal PI Benchmark | GPT-5-Mini-R | Robustness100 | 2 | 1mo ago | |
| Coding prompt injections | gpt-5-thinking | Score97 | 2 | 1mo ago | |
| Tool calling prompt injections | gpt-5-thinking | Robustness Score99 | 2 | 1mo ago | |
| Browsing prompt injections | gpt-5-thinking | Robustness Score99 | 2 | 1mo ago |