| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LongBench | LOOKAHEADKV | Average Score31.96 | 90 | 5d ago | |
| RULER 16k | Total Score95.02 | 59 | 3mo ago | ||
| RULER | RecaLLM-Qwen2.5-7B | Average Accuracy Score92.8 | 54 | 5d ago | |
| RULER 32k | RTPurbo | Overall Score90.06 | 49 | 15d ago | |
| Ruler (test) | S-NIAH-1100 | 43 | 2mo ago | ||
| RULER 64k | Llama-3.1-8B | VT Score100 | 43 | 15d ago | |
| RULER 8k | QUOKA | Score91.07 | 35 | 3mo ago | |
| RULER 4k | QUOKA | Score93.73 | 35 | 3mo ago | |
| RULER 128k | Llama-3.1-8B | Query Metric (MQ)98 | 29 | 3mo ago | |
| LongBench (test) | xKV | NarQA Score32.85 | 18 | 16d ago | |
| Ruler | Ministral-3-8B | Average Rank2 | 16 | 23d ago | |
| LB v2 (ALL) | Accuracy (ALL)38 | 13 | 3mo ago | ||
| L-Eval | InternLM2-Chat-20B-SFT | Close Score68.8 | 13 | 3mo ago | |
| RULER 32K context length (test) | Niah1 Score100 | 12 | 3mo ago | ||
| LongBench v2 | Qwen3-235B-A22B-Thinking | Overall Score59.76 | 9 | 12d ago | |
| 128K context | Quality Score (Q)80.12 | 6 | 14d ago | ||
| MultiNews, Qasper, RepoBench-P, and RULER Averaged 128K (test) | TTKV | Memory Footprint (GB)15.3 | 6 | 1mo ago | |
| RULER | Context Window Error (CWE)1.23 | 6 | 1mo ago | ||
| Humanity's Last Exam AA-LCR | GLM-4.6 | Accuracy54.3 | 6 | 3mo ago | |
| Long Context Benchmarks | DD | MDQA-10 Score32.3 | 5 | 3mo ago | |
| RULER 128K sequences Llama3.1-70B-Instruct | FullAttention | RULER Score65.03 | 4 | 2mo ago | |
| RULER ultra-long context official | Accuracy (128K)96 | 4 | 3mo ago | ||
| RULER 256K | Dense | NS1 (Sequence Accuracy 1)100 | 3 | 1mo ago | |
| RULER 32k context Average 13 tasks | Score0.635 | 2 | 19d ago | ||
| RULER 16k context Average 13 tasks | Score75 | 2 | 19d ago |