Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Large Language Model Serving on vLLM benchmark (128 prompts, 32 pre-fill, 256 generation tokens)
Loading...
76
TTFT (ms)
DeInfer
-790.56
5,058.72
10,908
16,757.28
Apr 20, 2026
TTFT (ms)
Iteration Latency (ms)
Updated 1mo ago
Evaluation Results
Method
Method
Links
TTFT (ms)
Iteration Latency (ms)
DeInfer
Model=OPT-30B, Low-ran...
2026.04
76
39
DeInfer
Model=OPT-30B, Low-ran...
2026.04
77
68
DeInfer
Model=LLaMA-65B, Low-r...
2026.04
81
59
DeInfer
Model=LLaMA-65B, Low-r...
2026.04
82
76
DeInfer
Model=LLaMA-3-70B, Low...
2026.04
83
79
DeInfer
Model=LLaMA-3-70B, Low...
2026.04
83
74
Base
Model=OPT-30B, Low-ran...
2026.04
6,999
311
Base
Model=LLaMA-65B, Low-r...
2026.04
19,341
764
Base
Model=LLaMA-3-70B, Low...
2026.04
21,740
812
Feedback
Search any
task
Search any
task