| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Creative Writing | Creative Writing | Win Rate58.6 | 36 | |
| Creative Writing | Creative writing (test) | Creativity90.38 | 20 | |
| Creative Writing | Creative Writing | Solved Rate51.98 | 16 | |
| Creative Writing | Creative Writing EQ-Bench v3 | ELO829.05 | 13 | |
| Creative Writing | Creative Writing | Discovery Score45.2 | 12 | |
| Creative Writing | Creative Writing v3 | Overall Rubric Score52.8 | 10 | |
| Creative Writing | Creative Writing Human Evaluation | Human Preference Count75 | 9 | |
| AI Text Detection | Creative Writing | AUC99.9 | 7 | |
| Creative Writing | Creative Writing | Win Rate vs Confidence70.1 | 6 | |
| AI-generated text detection | Creative Writing Out-of-Domain | F1 Score95.7 | 5 | |
| AI-generated text detection | Creative Writing In-Domain | F1 Score98.4 | 5 | |
| Creative Writing | Creative Writing Alpaca-Eval 100 problems 2.0 | Length-Controlled Win Rate93.81 | 4 |