Context Length

Benchmarks

Task Name	Dataset Name	SOTA Result
Inference Efficiency	32k context length efficiency Llama-3-8B (test)	Time To First Token (s)4.12	7
End-to-end decode throughput	context length 128K	Throughput (tok/s)108	6
End-to-end decode throughput	Context Length 32K	Decode Throughput (tok/s)183	6
End-to-end decode throughput	8K Context Length	Throughput (tok/s)268.5	6
Efficiency Analysis	Context Length 32K	Theoretical Compute (TFLOPs)928	5
Efficiency Analysis	Context Length 16K	Theoretical Compute (TFLOPs)336	5
Efficiency Analysis	Context Length 4K	Theoretical Compute (TFLOPs)60	5
Inference Efficiency	90k Context Length Llama-3.1-8B	Throughput (queries/s)8.9	4
Inference Efficiency	30k Context Length (Llama-3.1-8B)	Inference Throughput (QPS)15.8	4
Inference Efficiency	30k Context Length Llama-2-7B	Inference Throughput (QPS)6.6	4
KV Cache Footprint Evaluation	Context Length 128K 1.0 (test)	Effective b_KV (dense)2,360.6	3
KV Cache Footprint Evaluation	Context Length 32K 1.0 (test)	Effective b_KV (dense)1,863.6	3
KV Cache Footprint Evaluation	Context Length 8K 1.0 (test)	Effective KV Cache Size (dense)1,658.3	3
LLM Inference Performance	Context Length 200K	Prefill Time (s)10.7	3
LLM Inference Performance	Context Length 120K	Prefill Time (s)5.66	3
LLM Inference Performance	Context Length 60K	Prefill Time (s)2.59	3
LLM Inference Performance	Context Length 10K	Prefill Time (s)0.45	3

Showing 17 of 17 rows