Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
End-to-end decode throughput on Context Length 32K
Loading...
183
Decode Throughput (tok/s)
Llama-3.1-8B
89.712
113.931
138.15
162.369
May 13, 2026
Decode Throughput (tok/s)
Updated 14d ago
Evaluation Results
Method
Method
Links
Decode Throughput (tok/s)
Llama-3.1-8B
KV Representation=SphKV
2026.05
183
GPT-oss
KV Representation=SphKV
2026.05
172.5
Qwen2.5-14B
KV Representation=SphKV
2026.05
153.9
Llama-3.1-8B
KV Representation=Dense
2026.05
110
GPT-oss
KV Representation=Dense
2026.05
104.9
Qwen2.5-14B
KV Representation=Dense
2026.05
93.3
Feedback
Search any
task
Search any
task