| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Harmful prompts dataset | Attack Success Rate97 | 49 | 1mo ago | ||
| MultiJail | Self Defense | ASR (EN)13.02 | 18 | 1mo ago | |
| AdvBench-x | SmoothLLM | ASR (English)7.94 | 18 | 1mo ago | |
| VAJA | ZO-SPSA | Identity Attack Success Rate92.3 | 15 | 1mo ago | |
| HEx-PHI (test) | CS-DJ | ASR Category 186.67 | 12 | 1mo ago | |
| HarmBench (50 randomly sampled questions) | GPT-OSS-20B | ASR84 | 8 | 1mo ago | |
| AdvBench LLaMA-3.1-70B | HMNS | ASR (SMO, GPT-4o)39.7 | 5 | 4d ago | |
| AdvBench Phi-3 Medium 14B Instruct | HMNS | ASR (SMO, GPT-4o)41 | 5 | 4d ago | |
| AdvBench LLaMA-2-7B-Chat | HMNS | ASR (SMO, GPT-4o)40 | 5 | 4d ago |