Share your thoughts, 1 month free Claude Pro on usSee more

LLM Inference on Qwen3-8B 2k prompts Prefill-heavy workload

106,324Throughput (tok/s)

v1

Updated 1mo ago

Evaluation Results

Method	Links
v1 2026.05		106,324	805	36.2
EB+ 2026.05		105,432	843	36.2
v1 2026.05		88,488	304	23.4
EB+ 2026.05		87,268	307	23.8
2P+2D 2026.05		82,400	662	8.5
2P+2D 2026.05		78,486	5,741	8.2
2P+2D 2026.05		77,117	2,614	8.3
3P+1D 2026.05		70,255	412	13.1
v1 2026.05		67,278	133	16.1
EB+ 2026.05		65,297	124	16.7
v1 2026.05		47,947	1,141	85.3
EB+ 2026.05		47,322	1,205	86.1
2P+2D 2026.05		44,892	3,369	23.3
2P+2D 2026.05		43,326	9,245	23.9
2P+2D 2026.05		42,577	476	23.1
1P+3D 2026.05		40,638	6,103	6.4
1P+3D 2026.05		40,535	2,763	6.4
1P+3D 2026.05		39,925	12,357	6.4
v1 2026.05		37,623	380	57.8
EB+ 2026.05		37,208	381	58.5
3P+1D 2026.05		31,490	482	32.6
EB+ 2026.05		26,207	171	42.8
v1 2026.05		25,994	185	43
1P+3D 2026.05		23,245	4,385	14.6
1P+3D 2026.05		23,102	10,283	14.5
1P+3D 2026.05		22,531	21,412	14.6