Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Decoding on LLaMA 128K context 3.1-8B
Loading...
72.8
Dense Latency (ms)
Dense
69.16
70.98
72.8
74.62
May 20, 2026
Dense Latency (ms)
Cert Latency (ms)
Decoding Ratio
Mean K* Value
Cache Hit Rate
H2D Bandwidth (MB/s)
Updated 13d ago
Evaluation Results
Method
Method
Links
Dense Latency (ms)
Cert Latency (ms)
Decoding Ratio
Mean K* Value
Cache Hit Rate
H2D Bandwidth (MB/s)
Dense
Model=LLaMA 3.1-8B, Ha...
2026.05
72.8
-
-
-
-
-
Runtime-Certified Bounded-Error Quantized Attention
Model=LLaMA 3.1-8B, Ha...
2026.05
-
346.7
4.76
226
87.7
644
Feedback
Search any
task
Search any
task