| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LongBench | Average Score58.4 | 328 | 4d ago | ||
| RULER | DASH-3 | RULER Score0.9142 | 204 | 4d ago | |
| RULER | Qwen2.5-14B-Instruct-1M | Accuracy (8K Context)96.29 | 75 | 8d ago | |
| RULER 16K context | SINK + TAIL + TOP-K + ϕ | Accuracy (RULER 16K)83 | 72 | 1mo ago | |
| Ruler llama3-8B-Instruct (test) | S-NIAH-1100 | 37 | 2mo ago | ||
| LongBench-E 1.0 (test) | Elastic Attention | S-Doc QA Perf.49.92 | 37 | 3mo ago | |
| RULER 4K | Accuracy95.3 | 29 | 1d ago | ||
| LongBench (test) | Qasper Score50.87 | 29 | 1mo ago | ||
| HELMET | Summarization Score247 | 27 | 3mo ago | ||
| ZeroSCROLLS (test) | GDWM | GovReport Score35.8 | 24 | 3mo ago | |
| LongBench 4-task average | 2d hetero | Average Accuracy12.7 | 17 | 1mo ago | |
| RULER 1.0 (test) | MInference | Accuracy (4K Context)0.977 | 16 | 3mo ago | |
| InfiniteBench (test) | Llama 3.1 8B Instruct | En QA Score34.82 | 14 | 1mo ago | |
| RULER (test) | ProxyAttn | Sparsity80 | 13 | 2mo ago | |
| LongBench v2 | FoLoRA | Accuracy29.62 | 12 | 16h ago | |
| LongBench | AdaKV w/ CriticalKV | LongBench Average Score46.23 | 12 | 4d ago | |
| InfiniteBench | XAttn | En. Sum Accuracy18 | 10 | 27d ago | |
| RULER Sequence length = 64k | RDKV | S-NIAH Score (Component 1)100 | 8 | 5d ago | |
| LongBench 1.0 (test) | NrtvQA32.6 | 8 | 1mo ago | ||
| LongBench MultiFieldQA, MuSiQue, GovReport 2023 (test) | DroPE | MultiFieldQA Score32.18 | 8 | 3mo ago | |
| RULER (test) | Baseline | Accuracy (4k Context)96.6 | 7 | 1mo ago | |
| LongBench V2 (test) | Acc (Short)60 | 7 | 1mo ago | ||
| LongBench 16K context length | UltraLLaDA + BA-Att | NrtvQA Score13.3 | 6 | 13d ago | |
| 32K context | Quality Score79.39 | 6 | 13d ago | ||
| RULER | RoPE | CWE (Context Length 4K)10.21 | 6 | 21d ago |