Large Language Model Inference on Synthetic heavy-tail workload Pareto distribution

17.02Throughput (req/s)

BatchLLM

Updated 3mo ago

Evaluation Results

Method	Links
BatchLLM 2024.11		17.02	3.2
vLLM 2024.11		5.26	1