Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long Context Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-context ReasoningLong-context Benchmarks 100K context LB-V2 DocMath Frames LB-MQA (test)
DocMath Score66.7
36
Long-context ReasoningLong-context Benchmarks 16K context DocMath Frames LB-MQA V2 (test)
DocMath64.1
36
Fact chaining & relational reasoningLong-context benchmarks
Accuracy (8k Context)52.8
21
Multi-round co-reference resolutionLong-context benchmarks
Score (8k Context)38.5
21
Passage re-rankingLong-context benchmarks
Performance (8k Context)50.5
21
Synthetic recallLong-context benchmarks
Synthetic Recall (8k context)100
21
Retrieval-Augmented GenerationLong-context benchmarks
RAG Score (8k Context)53.7
16
Long Context EvaluationLong Context Benchmarks
MDQA-10 Score32.3
5
Showing 8 of 8 rows