Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Decoding Latency on Llama-3.1-8B 32k sequence length v1 (inference)
Loading...
0.033
Decoding Latency (s)
Full Cache
0.01592
0.13121
0.2465
0.36179
May 26, 2025
Decoding Latency (s)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Decoding Latency (s)
Full Cache
Hardware=NVIDIA A100 (...
2025.05
0.033
Full Cache
Hardware=NVIDIA RTX 30...
2025.05
0.042
TailorKV
Hardware=NVIDIA A100 (...
2025.05
0.047
TailorKV
Hardware=NVIDIA RTX 30...
2025.05
0.067
PQCache
Hardware=NVIDIA A100 (...
2025.05
0.105
OffloadCache
Hardware=NVIDIA A100 (...
2025.05
0.227
OffloadCache
Hardware=NVIDIA RTX 30...
2025.05
0.46
Feedback
Search any
task
Search any
task