| Creative Writing Evaluation Set 1.0 (test) | Phi-3-Mini + Top-p | Creativity8.6 | | 54 | 22d ago |
| WildBench | | WildBench Score83.9 | | 49 | 2d ago |
| Alpaca SFT Short Stories | | Self-BLEU (Diversity)3.7 | | 36 | 1mo ago |
| Creative Writing | Min-k | Win Rate58.6 | | 36 | 1mo ago |
| Creative Writing v3 | Hybrid Reward | Overall Rubric Score83.3 | | 32 | 5d ago |
| CreativeWriting v3 | SPARD | LLM Judge73.95 | | 26 | 1mo ago |
| Arena-Hard Creative Writing v2 | Gemini-2.5-Pro | Score90.8 | | 25 | 3mo ago |
| Creative writing (test) | | Creativity90.38 | | 20 | 1mo ago |
| Overall Average Poem, Joke, Story | VS-Standard | Semantic Diversity0.3603 | | 20 | 2mo ago |
| Story | VS-Standard | Semantic Diversity38.6 | | 20 | 2mo ago |
| Poem | | Semantic Diversity31.22 | | 20 | 2mo ago |
| Blessing short-text | | Score94.6 | | 18 | 1mo ago |
| WritingBench | AMARIS | Score57.9 | | 18 | 14d ago |
| Arena Hard | Online DPO with WILDREWARD | Win Rate63.5 | | 18 | 1mo ago |
| Creative Writing | Self-Refine | Solved Rate51.98 | | 16 | 2mo ago |
| WildBench (test) | | WildBench Score64.4 | | 15 | 2mo ago |
| Mythos 100 prompts (test) | GPT-5 | Word Count834 | | 13 | 1mo ago |
| ArenaHard creative writing v2.0 | DPWriter | WR Score29 | | 13 | 3mo ago |
| Creative Writing EQ-Bench v3 | DPWriter | ELO829.05 | | 13 | 3mo ago |
| CW | | Performance Score86.97 | | 12 | 21h ago |
| Creative Writing | SFT+DPO+GRPO | Discovery Score45.2 | | 12 | 3mo ago |
| Creative Writing Evaluation Set | Top-H | Metric 1 Score8.95 | | 9 | 22d ago |
| Creative Writing Human Evaluation | Min-k | Human Preference Count75 | | 9 | 1mo ago |
| ROCStories + WritingPrompts | QD-LLM | QD-Score63.3 | | 6 | 21d ago |
| ROCStories + WritingPrompts Llama-3-70B (test) | QD-LLM | QD-Score63.3 | | 6 | 21d ago |