Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-form Answer Generation on StoryQA
Loading...
0.9203
Spearman Correlation
Merge
0.543196
0.641098
0.739
0.836902
Apr 13, 2026
Spearman Correlation
Kendall Correlation
Updated 1mo ago
Evaluation Results
Method
Method
Links
Spearman Correlation
Kendall Correlation
Merge
Granularity=Combined
2026.04
0.9203
0.8923
WPA
Granularity=Instance-l...
2026.04
0.8767
0.8467
Coarse 3-level
Granularity=Context-bound
2026.04
0.8303
0.7832
PCP
Granularity=Instance-l...
2026.04
0.8125
0.7602
ROUGE-L
Granularity=Instance-l...
2026.04
0.6346
0.5897
Coarse 5-level
Granularity=Task-level
2026.04
0.6075
0.5647
Checklist
Granularity=Instance-l...
2026.04
0.6061
0.5877
BLEU
Granularity=Instance-l...
2026.04
0.5577
0.4872
Feedback
Search any
task
Search any
task