Share your thoughts, 1 month free Claude Pro on usSee more

LLM Inference

Benchmarks

Dataset Name	SOTA Method	Metric
Alpaca	CATS w/ EAGLE	Speedup5.56	57	21d ago
LLaMA2 7B	JIT+CUDA	TTFT (ms)11.09	33	1mo ago
Qwen3-8B 2k prompts Decode-heavy workload	EB+	Throughput (tok/s)46,782	30	1d ago
Qwen3-8B 2k prompts Balanced workload	v1	Throughput (tok/s)61,106	28	1d ago
Qwen3-8B 2k prompts Prefill-heavy workload	v1	Throughput (tok/s)106,324	26	1d ago
MBPP sanitized	TIDE	Throughput (tokens/s)2.44	24	14d ago
Aggregate Mean over Alpaca, CodeAlpaca, HumanEval, LiveCodeBench, Math500, MBPP, MT-Bench	DART	Mean Speedup2.87	21	3mo ago
MBPP	DART	Speedup3.09	21	3mo ago
Math500	DART	Speedup2.84	21	3mo ago
LiveCodeBench	DART	Speedup2.81	21	3mo ago
Qwen3-8B workload (c=512) on RTX PRO 6000	EB+	SLO Attainment (%)80.3	18	1d ago
Transformers (PyTorch) workflow Qwen3 family (inference)	CryptoTensors	Model Load Time (s)1.16	18	3mo ago
ToolBench	AugServe	Goodput (req/s)3.9	18	3mo ago
Merge	AugServe	Goodput (req/s)1.16	18	3mo ago
Qwen3-8B synthetic workload (mu_L=512, mu_O=256)	EB+	Throughput (tok/s)48,015	16	1d ago
LLaMA-7B v1 (serving)	FlashSVD v1.5	Decode Latency (ms/token)12.16	16	22d ago
Llama 3.2 Samsung Galaxy S25 Ultra 1B (test)	ET QNN	Prefill Min Throughput (tokens/sec)2,813.19	13	22d ago
Qwen3 Samsung Galaxy S25 Ultra 0.6B (test)		Prefill Throughput (min)1,709.9	12	22d ago
Meta-Llama-3-8B prompt/gen = 128 tokens	AWQ (4-bit)	Memory Usage (MB)5,463	12	27d ago
WildChat	v1	RPS (Requests/s)75.17	11	1d ago
Llama 3.2 Google Pixel 9 Pro XL 1B (test)	ET Vulkan	Prefill Throughput (min) (tokens/sec)530.02	10	22d ago
Qwen3 Google Pixel 9 Pro XL 0.6B (test)	ET Vulkan	Prefill Throughput (min, tokens/sec)591.01	10	22d ago
Phi4 Mini Samsung Galaxy S25 Ultra 3.8B (test)	ET QNN	Prefill Throughput (min, tokens/sec)1,161.29	10	22d ago
ToolBench dataset	AugServe	SLO Attainment100	9	3mo ago
Merge dataset	AugServe	SLO Attainment54.3	9	3mo ago

Showing 25 of 44 rows