| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Inference Efficiency | 32k context length efficiency Llama-3-8B (test) | Time To First Token (s)4.12 | 7 | |
| Efficiency Analysis | Context Length 32K | Theoretical Compute (TFLOPs)928 | 5 | |
| Efficiency Analysis | Context Length 16K | Theoretical Compute (TFLOPs)336 | 5 | |
| Efficiency Analysis | Context Length 4K | Theoretical Compute (TFLOPs)60 | 5 | |
| Inference Efficiency | 90k Context Length Llama-3.1-8B | Throughput (queries/s)8.9 | 4 | |
| Inference Efficiency | 30k Context Length (Llama-3.1-8B) | Inference Throughput (QPS)15.8 | 4 | |
| Inference Efficiency | 30k Context Length Llama-2-7B | Inference Throughput (QPS)6.6 | 4 | |
| LLM Inference Performance | Context Length 200K | Prefill Time (s)10.7 | 3 | |
| LLM Inference Performance | Context Length 120K | Prefill Time (s)5.66 | 3 | |
| LLM Inference Performance | Context Length 60K | Prefill Time (s)2.59 | 3 | |
| LLM Inference Performance | Context Length 10K | Prefill Time (s)0.45 | 3 |