Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM Inference Serving Performance on 1024 Input / 32 Output Context

1.48TPOT Speedup vs DeepGEMM

RaMP

1.23041.29521.361.4248Apr 28, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
1.481.341.161.121.191.06
2026.04
1.431.341.151.041.181.08
2026.04
1.241.291.110.951.131.02