| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Defense | MiniGPT-4 | Attack Success Rate (ASR)7.32 | 36 | |
| Jailbreaking | GPT-4o | ASR0.99 | 19 | |
| Language Modeling | GPT Small (val) | Validation Perplexity27.95 | 12 | |
| AI-generated table detection | GPT 5.2 (External Holdout) | AUROC88.3 | 12 | |
| Adversarial Attack | GPT-4o | ASR3.8 | 11 | |
| Jailbreaking | GPT 5.1 | ASR96.5 | 9 | |
| Detection of paraphrased text | GPT Paraphrased 4.1 | ROC AUC (1% FPR)0.3977 | 8 | |
| Jailbreak | GPT 4.1 8 July 2025 release | ASR99.8 | 5 | |
| Text-to-Video Generation | GPT-G | Semantic Objective76.8 | 4 | |
| Machine-generated text detection | GPT-3.5 (test) | Accuracy99.14 | 4 | |
| Summary Similarity Evaluation | GPT generated summaries 5.1 | BERTScore-F186.5 | 3 | |
| Text Generation | MiniGPT-4 | BLEU-148.1 | 3 | |
| AI-generated paper detection | GPT Clean Holdout 5.2 (test) | AUROC0.8857 | 1 |