| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Inference Efficiency | 32k context length efficiency Llama-3-8B (test) | Time To First Token (s)4.12 | 7 | |
| End-to-end decode throughput | context length 128K | Throughput (tok/s)108 | 6 | |
| End-to-end decode throughput | Context Length 32K | Decode Throughput (tok/s)183 | 6 | |
| End-to-end decode throughput | 8K Context Length | Throughput (tok/s)268.5 | 6 | |
| Efficiency Analysis | Context Length 32K | Theoretical Compute (TFLOPs)928 | 5 | |
| Efficiency Analysis | Context Length 16K | Theoretical Compute (TFLOPs)336 | 5 | |
| Efficiency Analysis | Context Length 4K | Theoretical Compute (TFLOPs)60 | 5 | |
| Inference Efficiency | 90k Context Length Llama-3.1-8B | Throughput (queries/s)8.9 | 4 | |
| Inference Efficiency | 30k Context Length (Llama-3.1-8B) | Inference Throughput (QPS)15.8 | 4 | |
| Inference Efficiency | 30k Context Length Llama-2-7B | Inference Throughput (QPS)6.6 | 4 | |
| KV Cache Footprint Evaluation | Context Length 128K 1.0 (test) | Effective b_KV (dense)2,360.6 | 3 | |
| KV Cache Footprint Evaluation | Context Length 32K 1.0 (test) | Effective b_KV (dense)1,863.6 | 3 | |
| KV Cache Footprint Evaluation | Context Length 8K 1.0 (test) | Effective KV Cache Size (dense)1,658.3 | 3 | |
| LLM Inference Performance | Context Length 200K | Prefill Time (s)10.7 | 3 | |
| LLM Inference Performance | Context Length 120K | Prefill Time (s)5.66 | 3 | |
| LLM Inference Performance | Context Length 60K | Prefill Time (s)2.59 | 3 | |
| LLM Inference Performance | Context Length 10K | Prefill Time (s)0.45 | 3 |