Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Decoding Latency on Llama-2-7B 64k sequence length v1 (inference)
Loading...
0.098
Decoding Latency (s)
TailorKV
0.03124
0.48187
0.9325
1.38313
May 26, 2025
Decoding Latency (s)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Decoding Latency (s)
TailorKV
Hardware=NVIDIA A100 (...
2025.05
0.098
PQCache
Hardware=NVIDIA A100 (...
2025.05
0.114
TailorKV
Hardware=NVIDIA RTX 30...
2025.05
0.135
Full Cache
Hardware=NVIDIA A100 (...
2025.05
0.14
OffloadCache
Hardware=NVIDIA A100 (...
2025.05
1.767
Feedback
Search any
task
Search any
task