| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LongBench | CortexDebate | M-Avg60.31 | 219 | 2d ago | |
| LongBench (test) | Average Score51.87 | 133 | 4d ago | ||
| InfiniteBench | En.Sum32.93 | 63 | 3d ago | ||
| RULER 32k context length | Quest | Average Score87.5 | 30 | 4d ago | |
| L-Eval | NTK | Coursera58.28 | 26 | 4d ago | |
| L-Eval (test) | Coursera58.28 | 26 | 4d ago | ||
| LongBench 1.0 (test) | Original | MultiNews61.5 | 21 | 3d ago | |
| LongBench v2 | HyLRA | Overall Accuracy46.32 | 20 | 3d ago | |
| SCROLLS (test) | COLT5-XL | Average Score47.4 | 18 | 4d ago | |
| SCBench | Llama-3.1-8B | KV Retrieval79 | 16 | 3d ago | |
| LongBench-e (test) | HATA | LCC (Language Comprehension Score)68.42 | 16 | 3d ago | |
| LongBench-e | Exact | LCC69.96 | 9 | 3d ago | |
| RULER 16k context length | Single-Key Score100 | 8 | 3d ago | ||
| LongBench Llama-3.2-1B-Instruct (test) | NQA16.12 | 7 | 3d ago | ||
| SCROLLS (dev) | BARTlarge-SLED | GovRep ROUGE-157.4 | 7 | 3d ago | |
| RULER 64k context length | Multi-Key Score98.4 | 6 | 3d ago | ||
| LV-Eval | Qwen3-1.7B-Mamba | CMRC (Mixup)7.05 | 4 | 4d ago | |
| InfiniteBench | Qwen3-1.7B-ALLMEM | InfiniteBench QA (EN) Score7.84 | 4 | 3d ago | |
| RULER 128k | Vanilla | Average Score49.11 | 4 | 4d ago | |
| LongBench Zh out-of-domain (evaluation) | SingleDoc Acc61.2 | 3 | 3d ago | ||
| RULER 4k context length | Single-Key Score100 | 2 | 3d ago |