Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Context Length

Benchmarks

Task NameDataset NameSOTA ResultTrend
Inference Efficiency32k context length efficiency Llama-3-8B (test)
Time To First Token (s)4.12
7
End-to-end decode throughputcontext length 128K
Throughput (tok/s)108
6
End-to-end decode throughputContext Length 32K
Decode Throughput (tok/s)183
6
End-to-end decode throughput8K Context Length
Throughput (tok/s)268.5
6
Efficiency AnalysisContext Length 32K
Theoretical Compute (TFLOPs)928
5
Efficiency AnalysisContext Length 16K
Theoretical Compute (TFLOPs)336
5
Efficiency AnalysisContext Length 4K
Theoretical Compute (TFLOPs)60
5
Inference Efficiency90k Context Length Llama-3.1-8B
Throughput (queries/s)8.9
4
Inference Efficiency30k Context Length (Llama-3.1-8B)
Inference Throughput (QPS)15.8
4
Inference Efficiency30k Context Length Llama-2-7B
Inference Throughput (QPS)6.6
4
KV Cache Footprint EvaluationContext Length 128K 1.0 (test)
Effective b_KV (dense)2,360.6
3
KV Cache Footprint EvaluationContext Length 32K 1.0 (test)
Effective b_KV (dense)1,863.6
3
KV Cache Footprint EvaluationContext Length 8K 1.0 (test)
Effective KV Cache Size (dense)1,658.3
3
LLM Inference PerformanceContext Length 200K
Prefill Time (s)10.7
3
LLM Inference PerformanceContext Length 120K
Prefill Time (s)5.66
3
LLM Inference PerformanceContext Length 60K
Prefill Time (s)2.59
3
LLM Inference PerformanceContext Length 10K
Prefill Time (s)0.45
3
Showing 17 of 17 rows