| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | C4 | Perplexity4.77 | 1,182 | |
| Language Modeling | C4 (val) | PPL5.709 | 392 | |
| Language Modeling | C4 | Perplexity1 | 321 | |
| Language Modeling | C4 (test) | Perplexity4.97 | 268 | |
| Language Modeling | C4 | C4 Loss2.55 | 73 | |
| Language Model Pre-training | C4 Llama 2 pre-training (val) | Perplexity13.19 | 47 | |
| Language Modeling | C4 | Log-PPL2.834 | 35 | |
| Masked Language Modeling | C4 (val) | PPLX3.828 | 35 | |
| Watermark Detectability | C4 RealNewsLike (Del-0.2) (test) | AUC99.3 | 28 | |
| Language Modeling | C4 LLaMA-130M (val) | Perplexity18.504 | 27 | |
| Language Modeling | C4 Qwen2.5 (val) | Perplexity (PPL)15.8 | 27 | |
| Text Watermarking | C4 | PPL9.012 | 27 | |
| Watermark Detection | c4 subset | Accuracy100 | 24 | |
| Detection Accuracy | c4 subset | Accuracy100 | 24 | |
| Watermark Detection | C4 subset | Accuracy100 | 24 | |
| Language Model Pre-training | C4 Llama-160M scratch (val) | Validation Loss3.0908 | 20 | |
| Spoofing Attack Robustness | C4 RealNewsLike | AUC0.9284 | 20 | |
| Paraphrase Attack Robustness | C4 RealNewsLike | AUC0.9871 | 20 | |
| Multi-bit LLM Watermarking | C4 GEMMA2-9B-BASE Max 256 Tokens | AUC1 | 20 | |
| Multi-bit LLM Watermarking | C4 GEMMA2-9B-BASE Max 128 Tokens | AUC100 | 20 | |
| Multi-bit LLM Watermarking | C4 LLaMA3-8B-BASE Max 256 Tokens | AUC100 | 20 | |
| Multi-bit LLM Watermarking | C4 LLaMA3-8B-BASE Max 128 Tokens | AUC1 | 20 | |
| Language Modeling | C4 T5 (val) | PPLX15.82 | 20 | |
| Watermark Segment Classification | C4 Mistral-7B (val) | TPR100 | 18 | |
| Watermark Segment Classification | C4 Llama-7B (val) | TPR100 | 18 |