| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| BABILong 16k | Accuracy29.8 | 72 | 10d ago | ||
| BABILong 8k | Accuracy34.7 | 65 | 10d ago | ||
| LongBench | GHG-TDA | Score73.8 | 62 | 1mo ago | |
| BABILong 4k | Accuracy (BABILong 4k)38.5 | 51 | 10d ago | ||
| LongBench v2 | Average Score68.2 | 48 | 10d ago | ||
| LongBench | AoT | Accuracy (LongBench)68.7 | 45 | 10d ago | |
| LoCoMo | MemOS | Average F144.94 | 45 | 3d ago | |
| OOLONG | λ-RLM | Accuracy68.4 | 37 | 10d ago | |
| Long-context Benchmarks 100K context LB-V2 DocMath Frames LB-MQA (test) | Qwen3-30B-A3B-Thinking + SPELL | DocMath Score66.7 | 36 | 1mo ago | |
| Long-context Benchmarks 16K context DocMath Frames LB-MQA V2 (test) | Qwen3-30B-A3B-Thinking + SPELL | DocMath64.1 | 36 | 1mo ago | |
| ∞ Bench | MiA (Emb-Only) | Accuracy90.39 | 32 | 1mo ago | |
| OOLONG trec_coarse | Kimi K2 | Score86.6 | 28 | 1mo ago | |
| OOL-Pairs | Latency (s)5.1 | 27 | 26d ago | ||
| OOLONG | Latency (s)7.1 | 27 | 26d ago | ||
| BrowseComp+ 1K documents | SRLM (no sub-calls) | Accuracy94.6 | 16 | 1mo ago | |
| LongBench | CoT | Relative Cost1 | 14 | 10d ago | |
| BAMBOO 16k | DRIFT | AltQA Score41.5 | 13 | 1mo ago | |
| LongBench | Nirvana | NQA16.6 | 12 | 9d ago | |
| Oolong-Synth | Accuracy78.41 | 11 | 25d ago | ||
| BrowsCompLong | Accuracy88.07 | 11 | 25d ago | ||
| LOONG | Accuracy65.43 | 11 | 25d ago | ||
| OfficeQA | Accuracy57.14 | 10 | 10d ago | ||
| LongSeal | Accuracy64.96 | 10 | 10d ago | ||
| LongBench Llama-2-7B-4K | WINA | Code Completion62.54 | 9 | 1mo ago | |
| AA-LCR | gpt-oss-120b | Score48.3 | 8 | 1mo ago |