| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LongBench (test) | Llama3.1-8B | Avg Score58.7 | 136 | 11d ago | |
| LongBench | DPO w/ LongReward | Overall Average Score62.1 | 115 | 1mo ago | |
| LongBench V2 | Overall Score65.6 | 109 | 8d ago | ||
| LongBench | HotpotQA57.15 | 82 | 3d ago | ||
| RULER | PyramidInfer | Performance @ 4K Context157 | 65 | 2d ago | |
| LongBench | QUOKA | Accuracy103 | 60 | 1mo ago | |
| RULER | Score96 | 50 | 12d ago | ||
| LongBench 1.0 (test) | NarrativeQA26.63 | 32 | 8d ago | ||
| InfiniteBench v1 (test) | SnapKV | Dialogue20 | 31 | 1mo ago | |
| LongBench English | YaRN | Accuracy19.63 | 30 | 29d ago | |
| LongBench V1 | NQA31 | 30 | 1mo ago | ||
| LongBench (test) | VIST2-8B | SingleDoc Performance45.2 | 30 | 1mo ago | |
| MuSiQue | Logo-PO | SubEM51 | 27 | 8d ago | |
| RULER 32K | Accuracy94.48 | 26 | 19d ago | ||
| RULER 64K | RetroInfer | Accuracy92.37 | 25 | 3d ago | |
| RULER | Llama-3.1-8B-Instruct | Performance (8K Context)92.88 | 24 | 8d ago | |
| InfiniteBench | Minference | Math Score (F)0.4771 | 22 | 17d ago | |
| Infini-Bench (test) | SHAREDLLM | Math Score17.26 | 21 | 1mo ago | |
| LongBench v1 (test) | Llama-3.1-8B | SD QA49.6 | 21 | 1mo ago | |
| RULER | S1 Score100 | 20 | 24d ago | ||
| LongBench | CSAttention | MQA-E Score56.02 | 18 | 5d ago | |
| Average Overall | LongMab | SubEM40.95 | 18 | 8d ago | |
| LongBench-E 1.0 (test) | Qasper Score44.56 | 18 | 24d ago | ||
| LongBench | TidalDecode | MFQA30.94 | 18 | 1mo ago | |
| LongBench | BLASST | Overall Average Score31.8 | 17 | 1mo ago |