Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Large Language Model Inference on Synthetic heavy-tail workload Pareto distribution
Loading...
17.02
Throughput (req/s)
BatchLLM
4.7896
7.9648
11.14
14.3152
Nov 29, 2024
Throughput (req/s)
Speedup
Updated 1mo ago
Evaluation Results
Method
Method
Links
Throughput (req/s)
Speedup
BatchLLM
Model=Qwen 2.5 7B, Pre...
2024.11
17.02
3.2
vLLM
Model=Qwen 2.5 7B, Pre...
2024.11
5.26
1
Feedback
Search any
task
Search any
task