Share your thoughts, 1 month free Claude Pro on usSee more

Decoding Latency on Llama-3.1-8B 32k sequence length v1 (inference)

0.033Decoding Latency (s)

Full Cache

Updated 5mo ago

Evaluation Results

Method	Links
Full Cache 2025.05		0.033
Full Cache 2025.05		0.042
TailorKV 2025.05		0.047
TailorKV 2025.05		0.067
PQCache 2025.05		0.105
OffloadCache 2025.05		0.227
OffloadCache 2025.05		0.46