| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LongBench | CortexDebate | M-Avg60.31 | 294 | 20d ago | |
| LongBench (test) | Average Score51.87 | 147 | 1mo ago | ||
| LongBench-e | Dense | Average Score53.04 | 93 | 6d ago | |
| InfiniteBench | Full | En.Sum33.01 | 88 | 13d ago | |
| LongBench | Average Score58.4 | 86 | 1mo ago | ||
| LongBench v2 | HyLRA | Overall Accuracy46.32 | 62 | 20d ago | |
| LongBench 1.0 (test) | Original | MultiNews61.5 | 61 | 1mo ago | |
| LongBench v1 (test) | NrtvQA Score30.7 | 48 | 21d ago | ||
| RULER 32k context length | FWE0 | 39 | 20d ago | ||
| LongBench | NQA31.42 | 38 | 11d ago | ||
| LongBench | NrtvQA Score27.84 | 29 | 1mo ago | ||
| LongBench | NrtvQA Score27.96 | 26 | 21d ago | ||
| L-Eval | NTK | Coursera58.28 | 26 | 3mo ago | |
| L-Eval (test) | Coursera58.28 | 26 | 3mo ago | ||
| Longbench | KEYDIFF | NQA32.3 | 25 | 1mo ago | |
| LongBench | NtrvQA30.46 | 22 | 1d ago | ||
| RULER 64k context length | FWE (Error)0 | 22 | 5d ago | ||
| RULER 16k context length | FWE Score0 | 21 | 20d ago | ||
| LongBench | NrtvQA29.7 | 20 | 14d ago | ||
| LongBench 2024 (test) | Block-Dist - Full | Multi-doc QA47.23 | 20 | 15d ago | |
| RULER 16K | CWE Score89.28 | 18 | 1d ago | ||
| RULER 16K 1.0 (test) | CWE Score89.28 | 18 | 1d ago | ||
| LongBench | Palu | NrtvQA30.54 | 18 | 1mo ago | |
| SCROLLS (test) | COLT5-XL | Average Score47.4 | 18 | 3mo ago | |
| LongBench | MISA† | SQA Score50.91 | 16 | 4d ago |