| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Synthetic Context Sequences (test) | Full Cache | Latency (16k Context)0.019 | 16 | 4d ago | |
| Llama-3.1-8B 16k sequence length v1 (inference) | Full Cache | Decoding latency (s)0.024 | 8 | 4d ago | |
| Llama-3.1-8B 32k sequence length v1 (inference) | Full Cache | Decoding Latency (s)0.033 | 7 | 4d ago | |
| Llama-2-7B 32k sequence length v1 (inference) | TailorKV | Decoding Latency (s)0.062 | 6 | 4d ago | |
| Llama-2-7B 16k sequence length v1 (inference) | TailorKV | Decoding Latency (s)0.041 | 6 | 4d ago | |
| Llama-3.1-8B 64k sequence length v1 | Full Cache | Decoding Latency (s)0.05 | 5 | 4d ago | |
| Llama-2-7B 64k sequence length v1 (inference) | TailorKV | Decoding Latency (s)0.098 | 5 | 4d ago | |
| Llama-2-7B 96k sequence length v1 (inference) | PQCache | Decoding Latency (s)0.115 | 4 | 4d ago |