Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Long Context Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-context ReasoningLong-context Benchmarks 100K context LB-V2 DocMath Frames LB-MQA (test)
DocMath Score66.7
36
Long-context ReasoningLong-context Benchmarks 16K context DocMath Frames LB-MQA V2 (test)
DocMath64.1
36
Long Context EvaluationLong Context Benchmarks
MDQA-10 Score32.3
5
Showing 3 of 3 rows