Long-context Language Understanding

Benchmarks

Dataset Name	SOTA Method	Metric
LongBench	CortexDebate	M-Avg60.31	294	2mo ago
LongBench (test)		Average Score51.87	147	1mo ago
InfiniteBench	Full	En.Sum33.01	100	1mo ago
LongBench-e	Dense	Average Score53.04	93	1mo ago
LongBench		Average Score58.4	90	24d ago
LongBench 1.0 (test)	Original	MultiNews61.5	73	1mo ago
LongBench v2	HyLRA	Overall Accuracy46.32	62	2mo ago
LongBench		Aggregate Score49.8	60	1mo ago
LongBench v1 (test)		NrtvQA Score30.7	48	2mo ago
RULER 32k context length		FWE0	39	2mo ago
LongBench		NQA31.42	38	2mo ago
LongBench	K-VEC	NrtvQA Score30.04	37	25d ago
LongBench		NrtvQA Score27.84	29	3mo ago
L-Eval	NTK	Coursera58.28	26	4mo ago
L-Eval (test)		Coursera58.28	26	4mo ago
Longbench	KEYDIFF	NQA32.3	25	2mo ago
LongBench		NtrvQA30.46	22	1mo ago
RULER 64k context length		FWE (Error)0	22	1mo ago
RULER 16k context length		FWE Score0	21	2mo ago
LongBench		NrtvQA29.7	20	2mo ago
LongBench 2024 (test)	Block-Dist - Full	Multi-doc QA47.23	20	2mo ago
RULER 16K		CWE Score89.28	18	1mo ago
RULER 16K 1.0 (test)		CWE Score89.28	18	1mo ago
LongBench	Palu	NrtvQA30.54	18	3mo ago
SCROLLS (test)	COLT5-XL	Average Score47.4	18	4mo ago

Showing 25 of 79 rows