| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Creative Writing | Creative Writing | Win Rate58.6 | 36 | |
| Creative Writing | Creative Writing v3 | Overall Rubric Score83.3 | 32 | |
| Creative Writing | Creative writing (test) | Creativity90.38 | 20 | |
| Creative Writing | Creative Writing | Solved Rate51.98 | 16 | |
| Creative Writing | Creative Writing EQ-Bench v3 | ELO829.05 | 13 | |
| Creative Writing | Creative Writing | Discovery Score45.2 | 12 | |
| Creative Writing Generation | Creative Writing | Score71.58 | 10 | |
| Creative Writing | Creative Writing Evaluation Set | Metric 1 Score8.95 | 9 | |
| Creative Writing | Creative Writing Human Evaluation | Human Preference Count75 | 9 | |
| AI Text Detection | Creative Writing | AUC99.9 | 7 | |
| Creative Writing | Creative Writing | GPT-4 Coherence Score7.91 | 6 | |
| Creative Writing | Creative Writing | Win Rate vs Confidence70.1 | 6 | |
| AI-generated text detection | Creative Writing Out-of-Domain | F1 Score95.7 | 5 | |
| AI-generated text detection | Creative Writing In-Domain | F1 Score98.4 | 5 | |
| Style & chat | Creative Writing v3 | Score83.8 | 4 | |
| Creative Writing | Creative Writing | Vendi Score1.67 | 4 | |
| Creative Writing | Creative Writing Alpaca-Eval 100 problems 2.0 | Length-Controlled Win Rate93.81 | 4 |