Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Decoding Latency on Llama-2-7B 32k sequence length v1 (inference)
Loading...
0.062
Decoding Latency (s)
TailorKV
-0.00656
0.45622
0.919
1.38178
May 26, 2025
Decoding Latency (s)
Updated 4d ago
Evaluation Results
Method
Method
Links
Decoding Latency (s)
TailorKV
Hardware=NVIDIA A100 (...
2025.05
0.062
Full Cache
Hardware=NVIDIA A100 (...
2025.05
0.077
TailorKV
Hardware=NVIDIA RTX 30...
2025.05
0.087
PQCache
Hardware=NVIDIA A100 (...
2025.05
0.111
OffloadCache
Hardware=NVIDIA A100 (...
2025.05
0.838
OffloadCache
Hardware=NVIDIA RTX 30...
2025.05
1.776
Feedback
Search any
task
Search any
task