Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Decoding Latency on Llama-2-7B 16k sequence length v1 (inference)
Loading...
0.041
Decoding Latency (s)
TailorKV
0.00692
0.23696
0.467
0.69704
May 26, 2025
Decoding Latency (s)
Updated 4d ago
Evaluation Results
Method
Method
Links
Decoding Latency (s)
TailorKV
Hardware=NVIDIA A100 (...
2025.05
0.041
Full Cache
Hardware=NVIDIA A100 (...
2025.05
0.045
TailorKV
Hardware=NVIDIA RTX 30...
2025.05
0.067
PQCache
Hardware=NVIDIA A100 (...
2025.05
0.108
OffloadCache
Hardware=NVIDIA A100 (...
2025.05
0.433
OffloadCache
Hardware=NVIDIA RTX 30...
2025.05
0.893
Feedback
Search any
task
Search any
task