| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RULER 16k | Total Score95.02 | 59 | 2d ago | ||
| RULER 32k | HySparse | Overall Score89.3 | 41 | 2d ago | |
| RULER 8k | QUOKA | Score91.07 | 35 | 2d ago | |
| RULER 4k | QUOKA | Score93.73 | 35 | 2d ago | |
| RULER 128k | Llama-3.1-8B | Query Metric (MQ)98 | 29 | 4d ago | |
| RULER 64k | Llama-3.1-8B | VT Score100 | 29 | 4d ago | |
| LB v2 (ALL) | Accuracy (ALL)38 | 13 | 4d ago | ||
| L-Eval | InternLM2-Chat-20B-SFT | Close Score68.8 | 13 | 4d ago | |
| RULER 32K context length (test) | Niah1 Score100 | 12 | 4d ago | ||
| Humanity's Last Exam AA-LCR | GLM-4.6 | Accuracy54.3 | 6 | 4d ago | |
| Long Context Benchmarks | DD | MDQA-10 Score32.3 | 5 | 4d ago | |
| LongBench | CompilerKV | Average Score37.97 | 4 | 4d ago | |
| RULER ultra-long context official | Accuracy (128K)96 | 4 | 4d ago | ||
| RULER 256K | Dense | NS1 (Sequence Accuracy 1)100 | 3 | 4d ago |