| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LoCoMo | MemCoT | F1 (Multi Hop)45.1 | 109 | 3d ago | |
| HotpotQA In-Distribution | QwenLong-L1-32B | Accuracy85.2 | 72 | 1mo ago | |
| LongBench (test) | LingoEDU | HotpotQA7,011 | 69 | 25d ago | |
| 2WikiMultiHopQA (Out-Of-Distribution) | ReMemR1 | Accuracy63.9 | 54 | 1mo ago | |
| NarrativeQA | MiA-RAG | F1 Score53.56 | 38 | 1mo ago | |
| En.QA | Logo-PO | SubEM36.75 | 36 | 8d ago | |
| NarrativeQA | LongMab | SubEM22 | 36 | 8d ago | |
| MFQA En | LongReward-PO | SubEM29.33 | 36 | 8d ago | |
| 2WikiMQA | LongMab | SubEM79.5 | 36 | 8d ago | |
| DetectiveQA-En | MiA | Accuracy75.5 | 32 | 1mo ago | |
| DetectiveQA-Zh | MiA | Accuracy0.8417 | 32 | 1mo ago | |
| LoCoMo | Mnemis | Single-Hop LLJ Score97.1 | 24 | 1mo ago | |
| LongBench V2 | SingleDoc Accuracy51.43 | 22 | 1mo ago | ||
| FRAMES | Avg@4 Score73.54 | 22 | 1mo ago | ||
| HotpotQA | Mean Score65.49 | 21 | 1mo ago | ||
| MuSiQue | LongMab | F1 Score51.02 | 19 | 8d ago | |
| NarrativeQA Passage Split | Score32.64 | 18 | 1mo ago | ||
| HotpotQA Passage | Score60.03 | 18 | 1mo ago | ||
| MuSiQue (Passage Split) | Score39.46 | 18 | 1mo ago | ||
| 2WikiMQA (Passage Split) | Score52.53 | 18 | 1mo ago | ||
| NarrativeQA Fixed Chunk 2048 | Score32.64 | 18 | 1mo ago | ||
| HotpotQA Fixed Chunk 2048 | QA Score60.03 | 18 | 1mo ago | ||
| MuSiQue Fixed Chunk 2048 | Score39.46 | 18 | 1mo ago | ||
| 2WikiMQA Fixed Chunk 2048 | QA Score52.53 | 18 | 1mo ago | ||
| Qasper | CE-GOCD | F183.09 | 17 | 1mo ago |