Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LongGenBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-form GenerationLongGenBench
CR80.03
24
Long-context reasoningLongGenBench 8K
GSM8K Score44.51
22
Long-context reasoningLongGenBench 4K
GSM8K Score53.18
22
Long-context Question AnsweringLONGGENBENCH n=30
CSQA74.1
5
Long Text GenerationLongGenBench 32K
CR84.95
4
Long Text GenerationLongGenBench 16K
CR98.51
4
Long-context generationLongGenBench
Completion Rate97.627
3
Showing 7 of 7 rows