LLM Decoding

Benchmarks

Dataset Name	SOTA Method	Metric
LLM Decoding		cuBLASLt Throughput40,000	78	4mo ago
Long Context 32K	SparDA	Decode Throughput (tok/s)1,218.1	48	1mo ago
Llama 70B 3.1	DeepFusionKernel	Throughput3,119.55	48	4mo ago
Long Context 64K	SparDA	Decoding Throughput (tok/s)1,042.6	42	1mo ago
Long Context 96K	SparDA	Decode Throughput (tok/s)930.5	38	1mo ago
Long Context 128K	SparDA	Throughput (tok/s)876.9	33	1mo ago
Llama 70B (H100 GPU Cluster) 3.1	DeepFusionKernel	Throughput894.32	27	4mo ago
Llama-2 70B		Throughput (tokens/s)36.1	6	1mo ago
ShareGPT		Latency (ms/token)2.4	5	4mo ago
Llama-2-70B	Pre3	Per-step Decoding Latency0.2163	4	4mo ago
Llama-3-8B	Pre3	Decode Time per Step0.5172	4	4mo ago
Bitext Telco Gradual Drift	ODD	EM0.037	3	4mo ago
Bitext Telco Incremental Drift	ODD	E.M.0.052	3	4mo ago
Bitext Telco Abrupt Drift	ODD	E.M.9.6	3	4mo ago
LLaMA 128K context 3.1-8B		Dense Latency (ms)72.8	1	2mo ago
LLaMA 64K context 3.1-8B		Dense Latency (ms)62.6	1	2mo ago
LLaMA 32K context 3.1-8B		Dense Latency (ms)61.2	1	2mo ago
LLaMA 8K context 3.1-8B		Dense Latency (ms)60.9	1	2mo ago

Showing 18 of 18 rows