Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Decoding Latency on Llama-3.1-8B 16k sequence length v1 (inference)

0.024Decoding latency (s)

Full Cache

0.015280.074140.1330.19186May 26, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.05
0.024
2025.05
0.033
2025.05
0.045
2025.05
0.062
2025.05
0.104
2025.05
0.124
2025.05
0.126
2025.05
0.242