SCROLLS

Benchmarks

Task Name	Dataset Name	SOTA Result
Long-context language understanding suite	ZeroSCROLLS	GovReport Score33.5	24
Zero-shot long-context reasoning	ZeroSCROLLS	Average Score33.5	18
Long-context language understanding	SCROLLS (test)	Average Score47.4	18
Question Answering	Scrolls NarraQA	Accuracy12.29	10
Question Answering	Scrolls QAsper	Accuracy14.8	10
Summarization	SCROLLS	ROUGE-114.83	8
Long-context language understanding	SCROLLS (dev)	GovRep ROUGE-157.4	7
Long-context Open-book Question Answering and Summarization	SCROLLS (val)	NaQA F1 Score23.9	6

Showing 8 of 8 rows