| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| AI-generated text detection | RAID (test) | TP @ 20% Error Threshold78.4 | 42 | |
| Machine-text detection | RAID-MovieReviews | AUROC0.96 | 21 | |
| Machine-text detection | RAID ArXiv | AUROC99 | 21 | |
| Machine Text Detection | RAID | AUC0.954 | 10 | |
| LLM-generated text detection | RAID Wikipedia-related samples | GPT-4 Performance Score99.35 | 8 | |
| LLM-generated text detection | RAID Wikipedia Paraphrased Phi-4 | ROC AUC0.8675 | 8 | |
| LLM-generated text detection | RAID Wikipedia Paraphrased Grok-3-mini | ROC AUC0.8906 | 8 | |
| LLM-generated text detection | RAID Wikipedia Paraphrased DeepSeek-V3-0324 | ROC AUC0.8926 | 8 | |
| LLM-generated text detection | RAID Wikipedia Paraphrased GPT-4.1 | ROC AUC0.9173 | 8 | |
| LLM-generated text detection | RAID Wikipedia Paraphrased GPT-4o-mini | ROC AUC0.9073 | 8 | |
| LLM-generated text detection | RAID Wikipedia-related (all) | GPT-4 Score89.12 | 8 | |
| LLM-generated text detection | RAID Wikipedia-related | GPT-4 Score99.94 | 8 | |
| Composite Text Detection | RAID Paraphrase and Revise | ROC AUC89.84 | 8 | |
| Composite Text Detection | RAID Human and Revise | ROC AUC0.7907 | 8 | |
| Composite Text Detection | RAID Human and Paraphrase | ROC AUC0.7866 | 8 | |
| LLM-generated text detection | RAID Reviews | ROC AUC1 | 8 | |
| LLM-generated text detection | RAID Reddit | ROC AUC99.92 | 8 | |
| LLM-generated text detection | RAID Recipe | ROC AUC0.9999 | 8 | |
| LLM-generated text detection | RAID Poetry | ROC AUC100 | 8 | |
| LLM-generated text detection | RAID Abstract | ROC AUC100 | 8 | |
| LLM-generated text detection | RAID Books | ROC AUC100 | 8 | |
| LLM-generated text detection | RAID News | ROC AUC100 | 8 | |
| AI Text Detection | RAID (in-domain) | Accuracy95.98 | 5 | |
| Machine-Generated Text Detection | RAID | TP @ 20%78.5 | 4 |