Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Decoding Latency on Llama-3.1-8B 16k sequence length v1 (inference)
Loading...
0.024
Decoding latency (s)
Full Cache
0.01528
0.07414
0.133
0.19186
May 26, 2025
Decoding latency (s)
Updated 4d ago
Evaluation Results
Method
Method
Links
Decoding latency (s)
Full Cache
Hardware=NVIDIA A100 (...
2025.05
0.024
Full Cache
Hardware=NVIDIA RTX 30...
2025.05
0.033
TailorKV
Hardware=NVIDIA A100 (...
2025.05
0.045
TailorKV
Hardware=NVIDIA RTX 30...
2025.05
0.062
PQCache
Hardware=NVIDIA A100 (...
2025.05
0.104
OffloadCache
Hardware=NVIDIA A100 (...
2025.05
0.124
PQCache
Hardware=NVIDIA RTX 30...
2025.05
0.126
OffloadCache
Hardware=NVIDIA RTX 30...
2025.05
0.242
Feedback
Search any
task
Search any
task