| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Machine-Generated Text Detection | MAGE | AUROC (Avg)99.1 | 24 | |
| Detection Evasion | MAGE | ASR99.9 | 18 | |
| Detection of Machine-Generated Text | MAGE Main Experimental Supplement (test) | TP@20%85.12 | 14 | |
| Detection of Machine-Generated Text | MAGE (test) | TP @ 20% Threshold85.12 | 14 | |
| Machine-Generated Text Detection | MAGE COLING2025 (val) | AUC79.55 | 13 | |
| Paraphrase Quality Assessment | MAGE shared subset (evaluation 300 AI-written samples) | PPL16.25 | 12 | |
| AI Detector Evasion | MAGE (evaluation set) | ASR (τ=0.5)91.3 | 12 | |
| Machine-generated text detection | MAGE Unseen Domains & Unseen Model (test) | Human Recall95.65 | 9 | |
| AI-generated text detection | MAGE BigScience 1.0 (test) | Accuracy96.7 | 8 | |
| AI-generated text detection | MAGE GLM 1.0 (test) | Accuracy94.1 | 8 | |
| AI-generated text detection | MAGE OPT 1.0 (test) | Accuracy89.1 | 8 | |
| AI-generated text detection | MAGE (LLaMA) 1.0 (test) | Accuracy88 | 8 | |
| AI-generated text detection | MAGE GPT 1.0 (test) | Accuracy82.7 | 8 | |
| AI-generated text detection | MAGE FLAN-T5 1.0 (test) | Accuracy68.9 | 8 | |
| Detection of LLM-generated text | MAGE Topic-based 3.5-turbo | Detection Accuracy100 | 8 | |
| Detection of LLM-generated text | MAGE News Topic-based 3.5-turbo | Detection Performance99.95 | 8 | |
| LLM-generated text detection | MAGE QA short text (<= 30 words) | AUROC0.9747 | 8 | |
| LLM-generated text detection | MAGE News short text (<= 30 words) | AUROC93.48 | 8 | |
| Detection of LLM generated text | MAGE QA | ROC AUC (FPR=1%)65.33 | 8 | |
| Detection of LLM generated text | MAGE News | ROC AUC @ FPR=1%0.6577 | 8 | |
| LLM-generated text detection | MAGE DIPPER attack | Human Score77.44 | 8 | |
| Machine-generated text detection | MAGE Arbitrary-domains & Arbitrary-models (test) | Human Recall0.9572 | 5 | |
| Machine-generated text detection | MAGE Paraphrasing Attack (test) | Human Recall79.66 | 4 | |
| AI-Generated Text Detection | MAGE DeepSeek-R1 OOD | Accuracy71 | 3 | |
| AI-Generated Text Detection | MAGE Claude-sonnet-4-5 OOD | Accuracy57.1 | 3 |