| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Many-shot in-context learning | Long-context benchmarks | ICL Performance (8k Context)74.2 | 21 | |
| Context Management | Long-context (test) | mTokens1 | 19 | |
| Long Context | Long Context benchmark | Accuracy67.59 | 14 | |
| Tabular Learning | Long-context 15 datasets v2 (test) | Avg. Normalized RMSE0.523 | 9 | |
| Average across tasks | Long-context benchmarks | Performance (8k Context)45.9 | 8 | |
| End-to-end LLM Inference Serving | Long-context 1024-token input, 32-token output | TPOT Speedup vs DeepGEMM1.48 | 3 | |
| Long-Context Training | Long-Context (train) | Metric- | 0 |