| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LLM Decoding | cuBLASLt Throughput40,000 | 78 | 2mo ago | ||
| Llama 70B 3.1 | DeepFusionKernel | Throughput3,119.55 | 48 | 3mo ago | |
| Llama 70B (H100 GPU Cluster) 3.1 | DeepFusionKernel | Throughput894.32 | 27 | 3mo ago | |
| ShareGPT | Latency (ms/token)2.4 | 5 | 2mo ago | ||
| Llama-2-70B | Pre3 | Per-step Decoding Latency0.2163 | 4 | 3mo ago | |
| Llama-3-8B | Pre3 | Decode Time per Step0.5172 | 4 | 3mo ago | |
| Bitext Telco Gradual Drift | ODD | EM0.037 | 3 | 3mo ago | |
| Bitext Telco Incremental Drift | ODD | E.M.0.052 | 3 | 3mo ago | |
| Bitext Telco Abrupt Drift | ODD | E.M.9.6 | 3 | 3mo ago | |
| LLaMA 128K context 3.1-8B | Dense Latency (ms)72.8 | 1 | 12d ago | ||
| LLaMA 64K context 3.1-8B | Dense Latency (ms)62.6 | 1 | 12d ago | ||
| LLaMA 32K context 3.1-8B | Dense Latency (ms)61.2 | 1 | 12d ago | ||
| LLaMA 8K context 3.1-8B | Dense Latency (ms)60.9 | 1 | 12d ago |