| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Bio | LLM-Judge Score81 | 59 | 2mo ago | ||
| Long-form generation ID | HUQ-SATRMD | PRR0.38 | 38 | 1mo ago | |
| Alpaca | I-GLASS | Perplexity (PPL)2.4268 | 30 | 20d ago | |
| 1D-DiffTask most OOD | SATMD-MSP | PRR0.24 | 24 | 1mo ago | |
| DiffTask OOD | HUQ-SATRMD | PRR0.16 | 24 | 1mo ago | |
| Long-form generation datasets 1D-SameTask - OOD | SATMD-MSP | PRR0.16 | 24 | 1mo ago | |
| Long-form generation datasets LOO - near OOD | HUQ-SATRMD | PRR0.3 | 24 | 1mo ago | |
| LongGenBench | CR80.03 | 24 | 3mo ago | ||
| 1D-DiffTask | SATMD-MSP | PRR0.24 | 14 | 1mo ago | |
| DiffTask | HUQ-SATRMD | PRR0.16 | 14 | 1mo ago | |
| Long-form generation 1D-SameTask | SATMD-MSP | PRR0.16 | 14 | 1mo ago | |
| Long-form generation LOO | HUQ-SATRMD | PRR0.3 | 14 | 1mo ago | |
| LongBench Write-en | GPT-4o | Average Sequence Length Success Rate92.2 | 13 | 3mo ago | |
| MIA | Qwen2.5 | Score84.6 | 8 | 19d ago | |
| DetailCaps 100 sampled instances | InternVL3 | Score64.4 | 8 | 19d ago | |
| LLaVA-Bench | Qwen2.5 | Score96.6 | 8 | 19d ago | |
| Bio | Decoding_c | PIA RLLMJ Score69.8 | 6 | 2mo ago | |
| WritingBench | Score5.1 | 6 | 2mo ago | ||
| FreshWiki | Agentic Reasoning | ROUGE-154.1 | 6 | 3mo ago | |
| LongWriter-Bench | Elastic-dLLM | Success Rate (Sl) [0, 500)94.9 | 4 | 15d ago |