| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| LGT Detection | WritingPrompts small Fast-DetectGPT benchmark (test) | AUROC99.9 | 54 | |
| LGT Detection | WritingPrompts-small Fast-DetectGPT benchmark | AUROC99.9 | 54 | |
| Language Modeling | WritingPrompts (test) | Diversity (div)88 | 14 | |
| Open-ended Text Generation | WritingPrompts | PPL1.76 | 10 | |
| Text Generation | WritingPrompts (WP) (test) | BLEU-10.224 | 10 | |
| Output Sequence Length Prediction | WritingPrompts super-long sequences (> 17k tokens) OOD | MAE195.89 | 8 | |
| LLM-generated text detection | WritingPrompts Fast-DetectGPT | AUROC98.8 | 5 | |
| Story Generation Evaluation | WritingPrompts (WP) (test) | Fascination73.88 | 2 | |
| Open-ended Text Generation | WritingPrompts (test) | Same Count85 | 2 |