Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Serving on LLaMA-2 70B chatbot workload
Loading...
45.2
TTFT (ms)
GPU-Only
44.9
46.925
48.95
50.975
Dec 11, 2025
TTFT (ms)
Decode Latency (ms)
P95 Latency (ms)
P99 Latency (ms)
Updated 1mo ago
Evaluation Results
Method
Method
Links
TTFT (ms)
Decode Latency (ms)
P95 Latency (ms)
P99 Latency (ms)
GPU-Only
Batch size=32
2025.12
45.2
18.3
19.4
21.2
CXL-NoSpec
Batch size=32
2025.12
46.8
23.5
27.3
31.6
CXL-SpecKV
Batch size=32
2025.12
47.1
19.8
21.7
23.8
CPU Offload
Batch size=32
2025.12
52.7
28.6
35.4
42.8
Feedback
Search any
task
Search any
task