| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LoCoMo | Block-Dist - Full | F1 (Multi Hop)50.25 | 171 | 13d ago | |
| LongMemEval LongConvQA | InfiniPot | SH Score90.3 | 84 | 13d ago | |
| HotpotQA In-Distribution | QwenLong-L1-32B | Accuracy85.2 | 72 | 3mo ago | |
| LongBench (test) | LingoEDU | HotpotQA7,011 | 69 | 2mo ago | |
| 2WikiMultiHopQA (Out-Of-Distribution) | ReMemR1 | Accuracy63.9 | 54 | 3mo ago | |
| LongBench N=162 | F1 Score31.5 | 45 | 14d ago | ||
| LoCoMo | Mnemis | Single-Hop LLJ Score97.1 | 45 | 21d ago | |
| DetectiveQA-En | MiA | Accuracy75.5 | 38 | 26d ago | |
| DetectiveQA-Zh | MiA-RAG | Accuracy80 | 38 | 26d ago | |
| NarrativeQA | MiA-RAG | F1 Score53.56 | 38 | 3mo ago | |
| En.QA | Logo-PO | SubEM36.75 | 36 | 1mo ago | |
| NarrativeQA | LongMab | SubEM22 | 36 | 1mo ago | |
| MFQA En | LongReward-PO | SubEM29.33 | 36 | 1mo ago | |
| 2WikiMQA | LongMab | SubEM79.5 | 36 | 1mo ago | |
| LongBench V2 | Overall Accuracy56.77 | 33 | 26d ago | ||
| FRAMES | Avg@4 Score73.54 | 22 | 3mo ago | ||
| HotpotQA | Mean Score65.49 | 21 | 3mo ago | ||
| LongBench | HotPotQA Accuracy59.71 | 20 | 7d ago | ||
| MuSiQue | LongMab | F1 Score51.02 | 19 | 1mo ago | |
| Qasper 128K context | F1 Score39 | 18 | 1mo ago | ||
| NarrativeQA Passage Split | Score32.64 | 18 | 2mo ago | ||
| HotpotQA Passage | Score60.03 | 18 | 2mo ago | ||
| MuSiQue (Passage Split) | Score39.46 | 18 | 2mo ago | ||
| 2WikiMQA (Passage Split) | Score52.53 | 18 | 2mo ago | ||
| NarrativeQA Fixed Chunk 2048 | Score32.64 | 18 | 2mo ago |