Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
LLM Serving on LLaMA-2 70B chatbot workload
Loading...
45.2
TTFT (ms)
GPU-Only
44.9
46.925
48.95
50.975
Dec 11, 2025
TTFT (ms)
Decode Latency (ms)
P95 Latency (ms)
P99 Latency (ms)
Updated 4d ago
Evaluation Results
Method
Method
Links
TTFT (ms)
Decode Latency (ms)
P95 Latency (ms)
P99 Latency (ms)
GPU-Only
Batch size=32
2025.12
45.2
18.3
19.4
21.2
CXL-NoSpec
Batch size=32
2025.12
46.8
23.5
27.3
31.6
CXL-SpecKV
Batch size=32
2025.12
47.1
19.8
21.7
23.8
CPU Offload
Batch size=32
2025.12
52.7
28.6
35.4
42.8
Feedback
Search any
task
Search any
task