| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Defense | MiniGPT-4 | Attack Success Rate (ASR)7.32 | 36 | |
| Jailbreaking | GPT-4o | ASR0.99 | 19 | |
| Adversarial Attack | GPT-4o | ASR0.6 | 14 | |
| Jailbreaking | GPT 5.1 | ASR96.5 | 13 | |
| Targeted Adversarial Attack | GPT 5.4 | ASR0 | 12 | |
| Language Modeling | GPT Small (val) | Validation Perplexity27.95 | 12 | |
| AI-generated table detection | GPT 5.2 (External Holdout) | AUROC88.3 | 12 | |
| End-to-end inference tuning | GPT | Tuning Time (s)23.8 | 9 | |
| Transfer Attack | GPT-5 | Attack Success Rate18.67 | 9 | |
| Transfer Attack | GPT 4.1 | Attack Success Rate (ASR)4.33 | 9 | |
| Targeted Attack | GPT closed-source standard MLLMs 5.4 | ASR3.8 | 8 | |
| Targeted Attack | GPT closed-source standard MLLMs 5.2 | ASR3.3 | 8 | |
| Targeted Attack | GPT-4o 5.2 (test) | Attack Success Rate (ASR)46.9 | 8 | |
| Language Modeling | GPT Pre-training (val) | Validation Perplexity19.98 | 8 | |
| Detection of paraphrased text | GPT Paraphrased 4.1 | ROC AUC (1% FPR)0.3977 | 8 | |
| Language Modeling | GPT nano (val) | Validation Loss3.25 | 5 | |
| Jailbreak | GPT 4.1 8 July 2025 release | ASR99.8 | 5 | |
| Text-to-Video Generation | GPT-G | Semantic Objective76.8 | 4 | |
| Machine-generated text detection | GPT-3.5 (test) | Accuracy99.14 | 4 | |
| Summary Similarity Evaluation | GPT generated summaries 5.1 | BERTScore-F186.5 | 3 | |
| Text Generation | MiniGPT-4 | BLEU-148.1 | 3 | |
| AI-generated paper detection | GPT Clean Holdout 5.2 (test) | AUROC0.8857 | 1 |