| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| LGT Detection | WritingPrompts small Fast-DetectGPT benchmark (test) | AUROC99.9 | 54 | |
| LGT Detection | WritingPrompts-small Fast-DetectGPT benchmark | AUROC99.9 | 54 | |
| Language Modeling | WritingPrompts | MAUVE32 | 33 | |
| Machine-generated text detection | WritingPrompts | AUROC1 | 30 | |
| LLM Text Attribution | WritingPrompts | TPR (FPR=0.01)100 | 18 | |
| Story Generation | WritingPrompts | Brier Accuracy (BA)98.44 | 16 | |
| Text Generation | WritingPrompts | ROUGE-134.6 | 15 | |
| Detection of LLM-Generated Text | WritingPrompts GPT-J-6B | AUROC97.6 | 15 | |
| Watermark Detection | WritingPrompts English (test) | TPR@FPR5%98.8 | 15 | |
| Language Modeling | WritingPrompts (test) | Diversity (div)88 | 14 | |
| Multi-branch story generation | WritingPrompts | Diversity3.7 | 10 | |
| Text generation | WritingPrompts | F1 Score22.11 | 10 | |
| Open-ended Text Generation | WritingPrompts | PPL1.76 | 10 | |
| Text Generation | WritingPrompts (WP) (test) | BLEU-10.224 | 10 | |
| Narrative Script Refinement | WritingPrompts | Character Development20.64 | 8 | |
| Output Sequence Length Prediction | WritingPrompts super-long sequences (> 17k tokens) OOD | MAE195.89 | 8 | |
| Story Generation | WritingPrompts (test) | PPL (Generation)8.14 | 5 | |
| AI-Generated Text Detection | WritingPrompts | TPR94.23 | 5 | |
| LLM-generated text detection | WritingPrompts Fast-DetectGPT | AUROC98.8 | 5 | |
| Multi-branch story generation | WritingPrompts | BLEU58.6 | 4 | |
| LLM-generated text detection | WritingPrompts Llama3-8B | TPR @ FPR=1%100 | 3 | |
| LLM-generated text detection | WritingPrompts GPT-4.1-mini | TPR @ FPR=1%31.33 | 3 | |
| Machine-generated text detection | WritingPrompts cross-source corruption DetectRL (test) | AUROC84.15 | 3 | |
| Story Generation Evaluation | WritingPrompts (WP) (test) | Fascination73.88 | 2 | |
| Open-ended Text Generation | WritingPrompts (test) | Same Count85 | 2 |