| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| LGT Detection | WritingPrompts small Fast-DetectGPT benchmark (test) | AUROC99.9 | 54 | |
| LGT Detection | WritingPrompts-small Fast-DetectGPT benchmark | AUROC99.9 | 54 | |
| Machine-generated text detection | WritingPrompts | AUROC1 | 30 | |
| LLM Text Attribution | WritingPrompts | TPR (FPR=0.01)100 | 18 | |
| Watermark Detection | WritingPrompts English (test) | TPR@FPR5%98.8 | 15 | |
| Language Modeling | WritingPrompts (test) | Diversity (div)88 | 14 | |
| Text generation | WritingPrompts | F1 Score22.11 | 10 | |
| Open-ended Text Generation | WritingPrompts | PPL1.76 | 10 | |
| Text Generation | WritingPrompts (WP) (test) | BLEU-10.224 | 10 | |
| Narrative Script Refinement | WritingPrompts | Character Development20.64 | 8 | |
| Output Sequence Length Prediction | WritingPrompts super-long sequences (> 17k tokens) OOD | MAE195.89 | 8 | |
| LLM-generated text detection | WritingPrompts Fast-DetectGPT | AUROC98.8 | 5 | |
| Story Generation Evaluation | WritingPrompts (WP) (test) | Fascination73.88 | 2 | |
| Open-ended Text Generation | WritingPrompts (test) | Same Count85 | 2 |