Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Context Length

Benchmarks

Task NameDataset NameSOTA ResultTrend
Inference Efficiency32k context length efficiency Llama-3-8B (test)
Time To First Token (s)4.12
7
Efficiency AnalysisContext Length 32K
Theoretical Compute (TFLOPs)928
5
Efficiency AnalysisContext Length 16K
Theoretical Compute (TFLOPs)336
5
Efficiency AnalysisContext Length 4K
Theoretical Compute (TFLOPs)60
5
Inference Efficiency90k Context Length Llama-3.1-8B
Throughput (queries/s)8.9
4
Inference Efficiency30k Context Length (Llama-3.1-8B)
Inference Throughput (QPS)15.8
4
Inference Efficiency30k Context Length Llama-2-7B
Inference Throughput (QPS)6.6
4
LLM Inference PerformanceContext Length 200K
Prefill Time (s)10.7
3
LLM Inference PerformanceContext Length 120K
Prefill Time (s)5.66
3
LLM Inference PerformanceContext Length 60K
Prefill Time (s)2.59
3
LLM Inference PerformanceContext Length 10K
Prefill Time (s)0.45
3
Showing 11 of 11 rows