| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| WikiText2 | Perplexity3.33 | 151 | 22d ago | ||
| C4 | Perplexity5.62 | 54 | 22d ago | ||
| Wiki | Perplexity3.53 | 54 | 22d ago | ||
| CNN/DailyMail | E2E Caprese | Accuracy27.16 | 35 | 1mo ago | |
| XSum | E2E Caprese | Accuracy24.89 | 35 | 1mo ago | |
| QASPER | E2E Caprese | Accuracy15.35 | 35 | 1mo ago | |
| CoQA | E2E Caprese | Accuracy65.5 | 35 | 1mo ago | |
| WikiText-103 | Dream | Perplexity (PPL)1 | 25 | 4d ago | |
| C4 (val) | OLMo Perplexity19.2 | 15 | 1mo ago | ||
| P3B3 | AMALIA-9B-DPO | General Score95.9 | 14 | 19d ago | |
| ALBA | Gemma 3-12B | Score51.1 | 14 | 19d ago | |
| PT-Exams Open Questions | Qwen 3-8B | Score77.3 | 14 | 19d ago | |
| Vicuna (test) | ROUGE-L19.4 | 14 | 1mo ago | ||
| Self-Instruct (test) | ROUGE-L23.4 | 14 | 1mo ago | ||
| Dolly databricks 15k (test) | ROUGE-L29.7 | 14 | 1mo ago | ||
| Goodreads Books (test) | P-GRPO | ROUGE-161.38 | 12 | 1mo ago | |
| Synthetic Data (test) | P-GRPO | ROUGE-162.67 | 12 | 1mo ago | |
| OpenWebText (val) | DGLM | OLMo Perplexity14.2 | 8 | 1mo ago | |
| Experimental Setup | GPT-2 | Relative Runtime1 | 8 | 1mo ago | |
| WikiText-2 (test) | Perplexity3.319 | 8 | 1mo ago | ||
| WebText (completions) | Coherence Tuning | Perplexity (PPL)10.16 | 7 | 1mo ago | |
| KGRec (test) | P-GRPO | ROUGE-156.18 | 6 | 1mo ago | |
| KGRec-Music (test) | P-GRPO | Rouge-156.18 | 6 | 1mo ago | |
| TinyStories (test) | GPT-4 | Grammar9.93 | 5 | 1mo ago | |
| Synthetic data v1 (test) | RankGAN | NLL8.247 | 4 | 1mo ago |