| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| DomainBench | SYTTA-16 | BLEU (Agriculture)71.37 | 144 | 2mo ago | |
| OpenWebText | SDTT | Perplexity3.18 | 142 | 8d ago | |
| LM1B (test) | UDLM | Entropy2.46 | 85 | 22d ago | |
| NoveltyBench | Diversity10 | 81 | 29d ago | ||
| CNN/Daily Mail (test) | DEL | COH Score83 | 64 | 3mo ago | |
| OpenWebText | IDLM-MDLM | Gen PPL11.312 | 54 | 1d ago | |
| WikiText-2 | Perplexity10.98 | 50 | 1mo ago | ||
| Medical Chatbot | ASR100 | 42 | 3mo ago | ||
| OWT | DFM (ESD) | GPT2 Perplexity5.33 | 41 | 1mo ago | |
| 5 Generation tasks | POP | Accuracy57.96 | 36 | 3mo ago | |
| GSM8K | Llama 3.1 8B Instruct | Accuracy84.99 | 35 | 21h ago | |
| Text Generation | PPL11.9 | 33 | 3mo ago | ||
| Text model inference M4 Max | vllm-mlx | Throughput (tok/s)525.5 | 31 | 3mo ago | |
| MSCOCO | SARE | BLEU-157.2 | 26 | 3mo ago | |
| LM1B | DFM (ESD) | Perplexity (PPL)68.11 | 24 | 1mo ago | |
| IFEval | Llama 3.1 8B Instruct | Accuracy74.49 | 23 | 4d ago | |
| Wikitext-103 | Refined by Gemma3 27B | Perplexity32.88 | 23 | 1mo ago | |
| Spec-Bench Overall | SpecBound | SD Score2.33 | 21 | 1mo ago | |
| MMLU (test) | TAP | BS Score57.83 | 20 | 22d ago | |
| AbGen | ROMA | Importance4.91 | 20 | 3mo ago | |
| ShareGPT | HiSpec | Speedup vs AR2.01 | 19 | 6d ago | |
| Open Web Text (OWT) (val) | Masked D-MMD | GPT-2 GM Score0.456 | 19 | 2mo ago | |
| Aggregate NLP Tasks (GEC, Smart Reply, Summarization, Tone Adjustment, QA) (test) | Average Score32.9 | 18 | 3mo ago | ||
| WebNLG seen categories (test) | CGE-LW | BLEU63.69 | 18 | 3mo ago | |
| Hazard Detection (val) | Qwen2-VL-7B ft | BLEU-40.658 | 17 | 2mo ago |