Share your thoughts, 1 month free Claude Pro on usSee more

Decoding Latency on Llama-3.1-8B 16k sequence length v1 (inference)

0.024Decoding latency (s)

Full Cache

Updated 4mo ago

Evaluation Results

Method	Links
Full Cache 2025.05		0.024
Full Cache 2025.05		0.033
TailorKV 2025.05		0.045
TailorKV 2025.05		0.062
PQCache 2025.05		0.104
OffloadCache 2025.05		0.124
PQCache 2025.05		0.126
OffloadCache 2025.05		0.242