Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SCROLLS

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-context language understanding suiteZeroSCROLLS
GovReport Score33.5
24
Zero-shot long-context reasoningZeroSCROLLS
Average Score33.5
18
Long-context language understandingSCROLLS (test)
Average Score47.4
18
Question AnsweringScrolls NarraQA
Accuracy12.29
10
Question AnsweringScrolls QAsper
Accuracy14.8
10
SummarizationSCROLLS
ROUGE-114.83
8
Long-context language understandingSCROLLS (dev)
GovRep ROUGE-157.4
7
Long-context Open-book Question Answering and SummarizationSCROLLS (val)
NaQA F1 Score23.9
6
Showing 8 of 8 rows