| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| WikiText2 | Perplexity2.9 | 287 | 14d ago | ||
| C4 | Perplexity5.52 | 190 | 14d ago | ||
| Wiki | Perplexity3.53 | 54 | 2mo ago | ||
| WikiText-103 | Dream | Perplexity (PPL)1 | 41 | 6d ago | |
| CNN/DailyMail | E2E Caprese | Accuracy27.16 | 35 | 3mo ago | |
| XSum | E2E Caprese | Accuracy24.89 | 35 | 3mo ago | |
| QASPER | E2E Caprese | Accuracy15.35 | 35 | 3mo ago | |
| CoQA | E2E Caprese | Accuracy65.5 | 35 | 3mo ago | |
| LM1B 1024 sequences of length 128 | D3PM | Generative PPL186.79 | 20 | 6d ago | |
| WikiText-2 (test) | Perplexity3.319 | 16 | 1mo ago | ||
| C4 (val) | OLMo Perplexity19.2 | 15 | 3mo ago | ||
| P3B3 | AMALIA-9B-DPO | General Score95.9 | 14 | 2mo ago | |
| ALBA | Gemma 3-12B | Score51.1 | 14 | 2mo ago | |
| PT-Exams Open Questions | Qwen 3-8B | Score77.3 | 14 | 2mo ago | |
| Vicuna (test) | ROUGE-L19.4 | 14 | 2mo ago | ||
| Self-Instruct (test) | ROUGE-L23.4 | 14 | 2mo ago | ||
| Dolly databricks 15k (test) | ROUGE-L29.7 | 14 | 2mo ago | ||
| Goodreads Books (test) | P-GRPO | ROUGE-161.38 | 12 | 2mo ago | |
| Synthetic Data (test) | P-GRPO | ROUGE-162.67 | 12 | 2mo ago | |
| OpenWebText (test) | GenPPL (Oracle)14.8 | 9 | 8d ago | ||
| E2E | LMNet | BLEU70.5 | 9 | 19d ago | |
| C4 sampled (test) | Perplexity6.97 | 8 | 1mo ago | ||
| OpenWebText (val) | DGLM | OLMo Perplexity14.2 | 8 | 3mo ago | |
| Experimental Setup | GPT-2 | Relative Runtime1 | 8 | 3mo ago | |
| WebText (completions) | Coherence Tuning | Perplexity (PPL)10.16 | 7 | 3mo ago |