| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RULER | Full attention | S-NIAH-1 (Pass-Key Retrieval)100 | 94 | 12d ago | |
| Needle-in-a-Haystack 32K context (test) | Quest | Accuracy76 | 30 | 3mo ago | |
| Needle-in-a-Haystack 8K context (test) | Quest | Accuracy100 | 30 | 3mo ago | |
| NIAH | Transformer | NIAH Score100 | 14 | 22h ago | |
| NIAH 64K 60 items, 3 needle positions (test) | F1 Score28.2 | 8 | 14d ago | ||
| RULER (test) | DroPE transformer | Multi-Query Success Rate2,800 | 8 | 3mo ago | |
| NIAH L=2048 | real_screen | Accuracy100 | 6 | 21d ago | |
| RULER NIAH | Retrieval Success (1K)100 | 4 | 26d ago | ||
| RULER S-NIAH-2 OOD | ASEntmax | Success Rate (4K Context)83.2 | 4 | 3mo ago | |
| RULER S-NIAH-2 (ID) | Softmax | Retrieval Success Rate (1K)100 | 4 | 3mo ago | |
| RULER S-NIAH-1 OOD | ASEntmax | Success Rate (4K Context)100 | 4 | 3mo ago | |
| RULER S-NIAH-1 ID | Softmax | Retrieval Success Rate (1K Context)100 | 4 | 3mo ago | |
| BABILong 32K context length | MemDLM (Train & Inference) | Accuracy9 | 3 | 2mo ago | |
| BABILong 16K context length | MemDLM (Train & Inference) | Needle-in-a-Haystack Accuracy (16K)22.2 | 3 | 2mo ago | |
| RULER 32K context length | MemDLM (Train & Inference) | RULER-MV Retrieval Score15.35 | 3 | 2mo ago | |
| RULER 16K context length | MemDLM (Train & Inference) | RULER-MV Score29.4 | 3 | 2mo ago | |
| NIAH Single 2 | StateX | Success Rate (4K Context)94 | 2 | 1mo ago |