Share your thoughts, 1 month free Claude Pro on usSee more

LLM Serving on LLaMA-2 70B chatbot workload

45.2TTFT (ms)

GPU-Only

Updated 4mo ago

Evaluation Results

Method	Links
GPU-Only 2025.12		45.2	18.3	19.4	21.2
CXL-NoSpec 2025.12		46.8	23.5	27.3	31.6
CXL-SpecKV 2025.12		47.1	19.8	21.7	23.8
CPU Offload 2025.12		52.7	28.6	35.4	42.8