Share your thoughts, 1 month free Claude Pro on usSee more

Decoding Latency on Llama-3.1-8B 64k sequence length v1

0.05Decoding Latency (s)

Full Cache

Updated 4mo ago

Evaluation Results

Method	Links
Full Cache 2025.05		0.05
TailorKV 2025.05		0.054
TailorKV 2025.05		0.074
PQCache 2025.05		0.108
OffloadCache 2025.05		0.435