| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| WikiText2 | Perplexity3.53 | 36 | 4d ago | ||
| CNN/DailyMail | E2E Caprese | Accuracy27.16 | 35 | 4d ago | |
| XSum | E2E Caprese | Accuracy24.89 | 35 | 4d ago | |
| QASPER | E2E Caprese | Accuracy15.35 | 35 | 4d ago | |
| CoQA | E2E Caprese | Accuracy65.5 | 35 | 4d ago | |
| C4 (val) | OLMo Perplexity19.2 | 15 | 4d ago | ||
| OpenWebText (val) | DGLM | OLMo Perplexity14.2 | 8 | 4d ago | |
| Experimental Setup | GPT-2 | Relative Runtime1 | 8 | 3d ago | |
| WikiText-2 (test) | Perplexity3.319 | 8 | 4d ago | ||
| WebText (completions) | Coherence Tuning | Perplexity (PPL)10.16 | 7 | 4d ago | |
| TinyStories (test) | GPT-4 | Grammar9.93 | 5 | 4d ago | |
| Synthetic data v1 (test) | RankGAN | NLL8.247 | 4 | 4d ago | |
| CommonGen | DeepSeek-V3.2 | Accuracy67.34 | 3 | 4d ago | |
| Large-scale model pool 15 LLMs | RouteMoA | Accuracy51.9 | 3 | 4d ago | |
| COCO (val) | RankGAN | BLEU-284.5 | 3 | 4d ago | |
| One Billion Word 6-gram | TTUR | JSD0.74 | 2 | 4d ago | |
| One Billion Word 4-gram | TTUR | JSD0.35 | 2 | 4d ago |