| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LongBench | GHG-TDA | Score73.8 | 62 | 4d ago | |
| LongBench v2 | Average Score68.2 | 48 | 4d ago | ||
| Long-context Benchmarks 100K context LB-V2 DocMath Frames LB-MQA (test) | Qwen3-30B-A3B-Thinking + SPELL | DocMath Score66.7 | 36 | 4d ago | |
| Long-context Benchmarks 16K context DocMath Frames LB-MQA V2 (test) | Qwen3-30B-A3B-Thinking + SPELL | DocMath64.1 | 36 | 4d ago | |
| ∞ Bench | MiA (Emb-Only) | Accuracy90.39 | 32 | 4d ago | |
| LoCoMo | MemOS | Average F144.94 | 25 | 4d ago | |
| BAMBOO 16k | DRIFT | AltQA Score41.5 | 13 | 4d ago | |
| LongBench Llama-2-7B-4K | WINA | Code Completion62.54 | 9 | 4d ago | |
| AA-LCR | gpt-oss-120b | Score48.3 | 8 | 4d ago | |
| LoCoMo (test) | FullContext | LLM Score72.3 | 7 | 4d ago | |
| BABILong | RoPE++EH | Err (2k Context)14.1 | 6 | 4d ago | |
| GPQA Diamond (out-of-distribution) | VTC-R1 | Accuracy48.5 | 6 | 4d ago | |
| InfiniteBench (test) | Qwen3-4B-SFT w/ DAPO | Reasoning Pa Score87.63 | 6 | 4d ago | |
| LongBench and Needle-In-A-Haystack (NIAH) (test) | SmolLM-DroPE | MultiFieldQA Score29.33 | 5 | 4d ago |