Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Creative Writing

Benchmarks

Task NameDataset NameSOTA ResultTrend
Creative WritingCreative Writing
Win Rate58.6
36
Creative WritingCreative Writing v3
Overall Rubric Score83.3
32
Creative WritingCreative writing (test)
Creativity90.38
20
Creative WritingCreative Writing
Solved Rate51.98
16
Creative WritingCreative Writing EQ-Bench v3
ELO829.05
13
Creative WritingCreative Writing
Discovery Score45.2
12
Creative Writing GenerationCreative Writing
Score71.58
10
Creative WritingCreative Writing Evaluation Set
Metric 1 Score8.95
9
Creative WritingCreative Writing Human Evaluation
Human Preference Count75
9
AI Text DetectionCreative Writing
AUC99.9
7
Creative WritingCreative Writing
GPT-4 Coherence Score7.91
6
Creative WritingCreative Writing
Win Rate vs Confidence70.1
6
AI-generated text detectionCreative Writing Out-of-Domain
F1 Score95.7
5
AI-generated text detectionCreative Writing In-Domain
F1 Score98.4
5
Style & chatCreative Writing v3
Score83.8
4
Creative WritingCreative Writing
Vendi Score1.67
4
Creative WritingCreative Writing Alpaca-Eval 100 problems 2.0
Length-Controlled Win Rate93.81
4
Showing 17 of 17 rows