Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Decoding Latency on Llama-3.1-8B 32k sequence length v1 (inference)

0.033Decoding Latency (s)

Full Cache

0.015920.131210.24650.36179May 26, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.05
0.033
2025.05
0.042
2025.05
0.047
2025.05
0.067
2025.05
0.105
2025.05
0.227
2025.05
0.46