| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | C4 | Perplexity4.77 | 1,422 | |
| Language Modeling | C4 | Perplexity1 | 1,071 | |
| Language Modeling | C4 (val) | PPL5.709 | 514 | |
| Language Modeling | C4 (test) | Perplexity4.97 | 342 | |
| Language Modeling | C4 | C4 Loss2.55 | 121 | |
| Pre-training | C4 (val) | Perplexity17.8 | 58 | |
| Language Generation | C4 | Perplexity5.62 | 54 | |
| Language Model Pre-training | C4 Llama 2 pre-training (val) | Perplexity13.19 | 47 | |
| Watermarking | C4 | TPR (FPR < 10^-4)100 | 40 | |
| Language Modeling | C4 | Entropy1 | 39 | |
| Watermark Detection | C4 | TPR @ 1% FPR100 | 36 | |
| Language Modeling | C4 | Log-PPL2.834 | 35 | |
| Masked Language Modeling | C4 (val) | PPLX3.828 | 35 | |
| Feature Space Preservation | C4 | Cosine Similarity100 | 32 | |
| Language Modeling | C4 | Word Perplexity18.08 | 32 | |
| Next Token Prediction | C4 (held-out) | Perplexity (PPL)21.5 | 30 | |
| Clustering | C4 | Clustering Score63.95 | 30 | |
| Next Token Prediction | C4 | OOD Perplexity21.1 | 30 | |
| Language Modeling | C4 | Perplexity9.44 | 28 | |
| Watermark Detectability | C4 RealNewsLike (Del-0.2) (test) | AUC99.3 | 28 | |
| Language Modeling | C4 LLaMA-130M (val) | Perplexity18.504 | 27 | |
| Language Modeling | C4 Qwen2.5 (val) | Perplexity (PPL)15.8 | 27 | |
| Text Watermarking | C4 | PPL9.012 | 27 | |
| Watermark Detection | C4 OPT-6.7B | ROC-AUC100 | 26 | |
| Watermark Detection | C4 | Detection Accuracy (No Attack)100 | 24 |