Long Context Benchmarks

Benchmarks

Task Name	Dataset Name	SOTA Result
Long-context Reasoning	Long-context Benchmarks 100K context LB-V2 DocMath Frames LB-MQA (test)	DocMath Score66.7	36
Long-context Reasoning	Long-context Benchmarks 16K context DocMath Frames LB-MQA V2 (test)	DocMath64.1	36
Fact chaining & relational reasoning	Long-context benchmarks	Accuracy (8k Context)52.8	21
Multi-round co-reference resolution	Long-context benchmarks	Score (8k Context)38.5	21
Passage re-ranking	Long-context benchmarks	Performance (8k Context)50.5	21
Synthetic recall	Long-context benchmarks	Synthetic Recall (8k context)100	21
Retrieval-Augmented Generation	Long-context benchmarks	RAG Score (8k Context)53.7	16
Long Context Evaluation	Long Context Benchmarks	MDQA-10 Score32.3	5

Showing 8 of 8 rows