Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SCROLLS

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-context language understanding suiteZeroSCROLLS
GovReport Score33.5
24
Long-context language understandingSCROLLS (test)
Average Score47.4
18
Question AnsweringScrolls NarraQA
Accuracy12.29
10
Question AnsweringScrolls QAsper
Accuracy14.8
10
SummarizationSCROLLS
ROUGE-114.83
8
Long-context language understandingSCROLLS (dev)
GovRep ROUGE-157.4
7
Long-context Open-book Question Answering and SummarizationSCROLLS (val)
NaQA F1 Score23.9
6
Showing 7 of 7 rows