| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| WikiText-2 | DEL | COH Score0.811 | 112 | 4d ago | |
| PTB | DEL | COH Score67.7 | 64 | 4d ago | |
| Law-MT Out of Domain (test) | FoSS | MAUVE32.17 | 16 | 4d ago | |
| Scaling Data Store | FoSS | MAUVE33.79 | 12 | 4d ago | |
| Story (test) | Typical Sampling | Diversity (DIV)0.96 | 12 | 4d ago | |
| Wikinews (test) | Typical Sampling | Diversity (DIV)0.95 | 12 | 4d ago | |
| Wikitext (test) | Typical Sampling | Diversity (DIV)95 | 12 | 4d ago | |
| CEB | FairSteer | Sentiment Score80 | 12 | 4d ago | |
| WritingPrompts | PPL1.76 | 10 | 4d ago | ||
| Wikitext-103 | PPL2.55 | 10 | 4d ago | ||
| Wikitext-103 (test) | DITTO | Win Rate84 | 8 | 4d ago | |
| WildBench | CTC-trained MDLM | Score-1.7 | 4 | 4d ago | |
| MTBench | CTC-trained MDLM | LLM Judge Score3.7 | 4 | 4d ago | |
| Creative-Writing-Bench v3 | CTC-trained MDLM | Score27.4 | 4 | 4d ago | |
| Arena-hard Creative-Writing | CTC-trained MDLM | Pairwise Win Rate80.2 | 4 | 4d ago | |
| Arena-hard Hard-Prompt | LLaDA-1.5 | Pairwise Win Rate58.5 | 4 | 4d ago | |
| Chatbot Arena inspired qualitative prompts (val) | Mamba | ELO1,150.78 | 4 | 4d ago | |
| HalluDial | +DFT+DFD | BERTScore76.81 | 3 | 4d ago | |
| WritingPrompts (test) | MIXCE | Same Count85 | 2 | 4d ago | |
| WebText (test) | MIXCE | Same Preference Count97 | 2 | 4d ago |