| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Bio | LLM-Judge Score81 | 59 | 16d ago | ||
| Long-form generation ID | HUQ-SATRMD | PRR0.38 | 38 | 4d ago | |
| 1D-DiffTask most OOD | SATMD-MSP | PRR0.24 | 24 | 4d ago | |
| DiffTask OOD | HUQ-SATRMD | PRR0.16 | 24 | 4d ago | |
| Long-form generation datasets 1D-SameTask - OOD | SATMD-MSP | PRR0.16 | 24 | 4d ago | |
| Long-form generation datasets LOO - near OOD | HUQ-SATRMD | PRR0.3 | 24 | 4d ago | |
| LongGenBench | CR80.03 | 24 | 1mo ago | ||
| 1D-DiffTask | SATMD-MSP | PRR0.24 | 14 | 4d ago | |
| DiffTask | HUQ-SATRMD | PRR0.16 | 14 | 4d ago | |
| Long-form generation 1D-SameTask | SATMD-MSP | PRR0.16 | 14 | 4d ago | |
| Long-form generation LOO | HUQ-SATRMD | PRR0.3 | 14 | 4d ago | |
| LongBench Write-en | GPT-4o | Average Sequence Length Success Rate92.2 | 13 | 1mo ago | |
| Bio | Decoding_c | PIA RLLMJ Score69.8 | 6 | 16d ago | |
| WritingBench | Score5.1 | 6 | 20d ago | ||
| FreshWiki | Agentic Reasoning | ROUGE-154.1 | 6 | 1mo ago |