Share your thoughts, 1 month free Claude Pro on usSee more

Decoding Latency on Llama-2-7B 16k sequence length v1 (inference)

0.041Decoding Latency (s)

TailorKV

Updated 5mo ago

Evaluation Results

Method	Links
TailorKV 2025.05		0.041
Full Cache 2025.05		0.045
TailorKV 2025.05		0.067
PQCache 2025.05		0.108
OffloadCache 2025.05		0.433
OffloadCache 2025.05		0.893