| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LongBench | F1 Score34 | 143 | 14d ago | ||
| LongBench (test) | Llama3.1-8B | Avg Score58.7 | 136 | 1mo ago | |
| LongBench V2 | Overall Score82.36 | 133 | 8d ago | ||
| LongBench | DPO w/ LongReward | Overall Average Score62.1 | 115 | 3mo ago | |
| RULER 16k (test) | IndexMem | RULER Score93.5 | 90 | 7d ago | |
| RULER 4k (test) | ExpectedAttention | RULER 4k Score95.7 | 90 | 7d ago | |
| LongBench 1.0 (test) | LaProx | NarrativeQA32.94 | 84 | 1d ago | |
| LongBench | HotpotQA57.15 | 82 | 1mo ago | ||
| LongBench (test) | Qwen3-8B | FewShot Performance71.4 | 72 | 18h ago | |
| RULER | Score96 | 66 | 1mo ago | ||
| RULER | PyramidInfer | Performance @ 4K Context157 | 65 | 1mo ago | |
| LongBench | QUOKA | Accuracy103 | 60 | 3mo ago | |
| LongBench | LKV | Average Score46.25 | 43 | 22d ago | |
| LongBench | Average Score48.11 | 38 | 4d ago | ||
| RULER 32K | Accuracy94.48 | 38 | 14d ago | ||
| RULER 64K | RetroInfer | Accuracy92.37 | 37 | 14d ago | |
| LongBench V1 | NQA31 | 36 | 22d ago | ||
| InfiniteBench v1 (test) | SnapKV | Dialogue20 | 31 | 3mo ago | |
| LongBench | LKV | Average Score47.26 | 30 | 22d ago | |
| LongBench English | YaRN | Accuracy19.63 | 30 | 2mo ago | |
| MuSiQue | Logo-PO | SubEM51 | 27 | 1mo ago | |
| RULER 128K | Accuracy88.3 | 27 | 14d ago | ||
| RULER | Dense | Average Accuracy91.44 | 27 | 4d ago | |
| LongBench | Llama3.1-8B | Average Score48.37 | 26 | 4d ago | |
| InfiniteBench | SinkRouter | Math Score (F)0.5 | 25 | 1mo ago |