Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Decoding on LLaMA 32K context 3.1-8B
Loading...
61.2
Dense Latency (ms)
Dense
58.14
59.67
61.2
62.73
May 20, 2026
Dense Latency (ms)
Cert Latency (ms)
Speedup Ratio
Mean K* Count
KV Cache Hit Rate
H2D Transfer Size (MB)
Updated 13d ago
Evaluation Results
Method
Method
Links
Dense Latency (ms)
Cert Latency (ms)
Speedup Ratio
Mean K* Count
KV Cache Hit Rate
H2D Transfer Size (MB)
Dense
Model=LLaMA 3.1-8B, Ha...
2026.05
61.2
-
-
-
-
-
Runtime-Certified Bounded-Error Quantized Attention
Model=LLaMA 3.1-8B, Ha...
2026.05
-
232.2
3.79
217
99.7
114
Feedback
Search any
task
Search any
task