| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LongBench | CAP-CoT | Accuracy (LongBench)70.4 | 101 | 1mo ago | |
| LongBench v2 | Average Score68.2 | 88 | 2d ago | ||
| LoCoMo | FluxMem | Average F195.06 | 75 | 6d ago | |
| BABILong 16k | Accuracy29.8 | 72 | 1mo ago | ||
| BABILong 8k | Accuracy34.7 | 65 | 1mo ago | ||
| LongBench | GHG-TDA | Score73.8 | 62 | 3mo ago | |
| BABILong 4k | Accuracy (BABILong 4k)38.5 | 51 | 1mo ago | ||
| OOLONG | λ-RLM | Accuracy68.4 | 37 | 1mo ago | |
| Long-context Benchmarks 100K context LB-V2 DocMath Frames LB-MQA (test) | Qwen3-30B-A3B-Thinking + SPELL | DocMath Score66.7 | 36 | 3mo ago | |
| Long-context Benchmarks 16K context DocMath Frames LB-MQA V2 (test) | Qwen3-30B-A3B-Thinking + SPELL | DocMath64.1 | 36 | 3mo ago | |
| RULER | HyLo-Llama-14MLA14M2 | RULER Performance (8K Context)75.3 | 35 | 1mo ago | |
| LongReason 64K-input 70K context | KVZip | Accuracy71.25 | 34 | 6d ago | |
| ∞ Bench | MiA (Emb-Only) | Accuracy90.39 | 32 | 3mo ago | |
| OOLONG trec_coarse | Kimi K2 | Score86.6 | 28 | 2mo ago | |
| OOL-Pairs | Latency (s)5.1 | 27 | 2mo ago | ||
| OOLONG | Latency (s)7.1 | 27 | 2mo ago | ||
| AA-LCR | LoongRL | Score53.5 | 26 | 2d ago | |
| LongGenBench 8K | GSM8K Score44.51 | 22 | 5d ago | ||
| LongGenBench 4K | GSM8K Score53.18 | 22 | 5d ago | ||
| LongReason | Score86.9 | 18 | 2d ago | ||
| FRAMES | DocQA | Score83.5 | 18 | 2d ago | |
| Long-context Reasoning Suite (test) | Average Score74.91 | 18 | 14d ago | ||
| BrowseComp+ 1K documents | SRLM (no sub-calls) | Accuracy94.6 | 16 | 2mo ago | |
| LongBench 256 tokens v2 | Mamba | Accuracy100 | 14 | 12d ago | |
| LongBench | CoT | Relative Cost1 | 14 | 1mo ago |