Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Creative Writing on Creative Writing Alpaca-Eval 100 problems 2.0
Loading...
93.81
Length-Controlled Win Rate
ICRL prompting
58.1068
67.3759
76.645
85.9141
May 21, 2025
Length-Controlled Win Rate
Standard Error
Updated 23d ago
Evaluation Results
Method
Method
Links
Length-Controlled Win Rate
Standard Error
ICRL prompting
Comparison baseline=Be...
2025.05
93.81
1.01
ICRL prompting
Comparison baseline=Se...
2025.05
86.32
3.03
ICRL prompting
Comparison baseline=Lo...
2025.05
78.36
1.99
ICRL prompting
Comparison baseline=Re...
2025.05
59.48
3.47
Feedback
Search any
task
Search any
task