| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RULER 16k | Total Score95.02 | 59 | 1mo ago | ||
| LongBench | LOOKAHEADKV | Average Score31.96 | 57 | 1mo ago | |
| Ruler (test) | S-NIAH-1100 | 43 | 1mo ago | ||
| RULER 32k | HySparse | Overall Score89.3 | 41 | 1mo ago | |
| RULER 8k | QUOKA | Score91.07 | 35 | 1mo ago | |
| RULER 4k | QUOKA | Score93.73 | 35 | 1mo ago | |
| RULER | RecaLLM-Qwen2.5-7B | Accuracy (Context 4k)98.8 | 34 | 5d ago | |
| RULER 128k | Llama-3.1-8B | Query Metric (MQ)98 | 29 | 1mo ago | |
| RULER 64k | Llama-3.1-8B | VT Score100 | 29 | 1mo ago | |
| LB v2 (ALL) | Accuracy (ALL)38 | 13 | 1mo ago | ||
| L-Eval | InternLM2-Chat-20B-SFT | Close Score68.8 | 13 | 1mo ago | |
| RULER 32K context length (test) | Niah1 Score100 | 12 | 1mo ago | ||
| LongBench v2 | CSAttention | Overall Score31.2 | 6 | 5d ago | |
| Humanity's Last Exam AA-LCR | GLM-4.6 | Accuracy54.3 | 6 | 1mo ago | |
| Long Context Benchmarks | DD | MDQA-10 Score32.3 | 5 | 1mo ago | |
| RULER 128K sequences Llama3.1-70B-Instruct | FullAttention | RULER Score65.03 | 4 | 17d ago | |
| RULER ultra-long context official | Accuracy (128K)96 | 4 | 1mo ago | ||
| RULER 256K | Dense | NS1 (Sequence Accuracy 1)100 | 3 | 3d ago |