| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LongBench | Average Score58.4 | 164 | 10d ago | ||
| RULER | GA-S2 | RULER Score0.911 | 148 | 1mo ago | |
| RULER 16K context | SINK + TAIL + TOP-K + ϕ | Accuracy (RULER 16K)83 | 72 | 10d ago | |
| RULER | Accuracy89.1 | 51 | 3d ago | ||
| Ruler llama3-8B-Instruct (test) | S-NIAH-1100 | 37 | 1mo ago | ||
| LongBench-E 1.0 (test) | Elastic Attention | S-Doc QA Perf.49.92 | 37 | 1mo ago | |
| LongBench (test) | Qasper Score50.87 | 29 | 9d ago | ||
| HELMET | Summarization Score247 | 27 | 1mo ago | ||
| ZeroSCROLLS (test) | GDWM | GovReport Score35.8 | 24 | 1mo ago | |
| RULER 1.0 (test) | MInference | Accuracy (4K Context)0.977 | 16 | 1mo ago | |
| InfiniteBench (test) | Llama 3.1 8B Instruct | En QA Score34.82 | 14 | 8d ago | |
| RULER (test) | ProxyAttn | Sparsity80 | 13 | 17d ago | |
| LongBench v2 | AdmTree | Single Doc QA34.9 | 10 | 1mo ago | |
| LongBench 1.0 (test) | NrtvQA32.6 | 8 | 4d ago | ||
| LongBench MultiFieldQA, MuSiQue, GovReport 2023 (test) | DroPE | MultiFieldQA Score32.18 | 8 | 1mo ago | |
| RULER (test) | Baseline | Accuracy (4k Context)96.6 | 7 | 10d ago | |
| LongBench V2 (test) | Acc (Short)60 | 7 | 5d ago | ||
| Long-Context Evaluation Suite MRCR v2, GraphWalks, LongBench v2, RULER, AA-LCR | IndexCache | Average Score78.7 | 5 | 1mo ago | |
| LongPPL 32k | Engram-27B | Book Perplexity4.14 | 4 | 1mo ago | |
| LongBench | Qwen3-1.7B | SAMSum42.04 | 3 | 1mo ago | |
| RULER | CoPE | Accuracy (8k Context)81.5 | 2 | 1mo ago |