Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Decoding Latency on Llama-3.1-8B 32k sequence length v1 (inference)

0.033Decoding Latency (s)

Full Cache

0.015920.131210.24650.36179May 26, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.05
0.033
2025.05
0.042
2025.05
0.047
2025.05
0.067
2025.05
0.105
2025.05
0.227
2025.05
0.46