Share your thoughts, 1 month free Claude Pro on usSee more

Decoding Latency on Llama-2-7B 32k sequence length v1 (inference)

0.062Decoding Latency (s)

TailorKV

Updated 5mo ago

Evaluation Results

Method	Links
TailorKV 2025.05		0.062
Full Cache 2025.05		0.077
TailorKV 2025.05		0.087
PQCache 2025.05		0.111
OffloadCache 2025.05		0.838
OffloadCache 2025.05		1.776