Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
End-to-end decode throughput on 128K Context Length
Loading...
108
Throughput (tok/s)
Llama-3.1-8B
53.608
67.729
81.85
95.971
May 13, 2026
Throughput (tok/s)
Updated 14d ago
Evaluation Results
Method
Method
Links
Throughput (tok/s)
Llama-3.1-8B
KV Representation=SphKV
2026.05
108
GPT-oss
KV Representation=SphKV
2026.05
102.4
Qwen2.5-14B
KV Representation=SphKV
2026.05
94.8
Llama-3.1-8B
KV Representation=Dense
2026.05
62.9
GPT-oss
KV Representation=Dense
2026.05
59.9
Qwen2.5-14B
KV Representation=Dense
2026.05
55.7
Feedback
Search any
task
Search any
task