LongBench

Benchmarks

Task Name	Dataset Name	SOTA Result
Long-context Language Modeling	LongBench	Average Score58.4	328
Long-context Language Understanding	LongBench	M-Avg60.31	294
Long-context language understanding	LongBench (test)	Average Score51.87	147
Long-context understanding	LongBench	F1 Score34	143
Long-context understanding	LongBench (test)	Avg Score58.7	136
Long Context Understanding	LongBench V2	Overall Score82.36	133
Query Routing	LongBench OOD v2	QA53	120
Long-context understanding	LongBench	Overall Average Score62.1	115
Long-context Reasoning	LongBench	Accuracy (LongBench)70.4	101
Long-context language understanding	LongBench-e	Average Score53.04	93
Long-context Evaluation	LongBench	Average Score31.96	90
Long-context Reasoning	LongBench v2	Average Score68.2	88
Long-context Language Understanding	LongBench	Average Score58.4	86
Long-context understanding	LongBench 1.0 (test)	NarrativeQA32.94	84
Long-context understanding	LongBench	HotpotQA57.15	82
Single-Doc Question Answering	LongBench	MultifieldQA Score53.67	75
Long-context understanding	LongBench (test)	FewShot Performance71.4	72
Long-context Question Answering	LongBench (test)	HotpotQA7,011	69
Question Answering	LongBench Qasper	F10.4459	62
Long-context language understanding	LongBench v2	Overall Accuracy46.32	62
Long-context Reasoning	LongBench	Score73.8	62
Long-context language understanding	LongBench 1.0 (test)	MultiNews61.5	61
Long-context Understanding	LongBench	Accuracy103	60
Long document retrieval	LongBench Retrieval v2 (full)	F1 Score0.4843	55
Few-shot Learning	LongBench	TREC Score82.62	51

Showing 25 of 205 rows

...