Share your thoughts, 1 month free Claude Pro on usSee more

LLM Inference on Qwen3-8B 2k prompts Decode-heavy workload

46,782Throughput (tok/s)

EB+

Updated 1mo ago

Evaluation Results

Method	Links
EB+ 2026.05		46,782	314	10.8
v1 2026.05		46,622	251	10.9
1P+3D 2026.05		38,287	1,908	11.9
v1 2026.05		30,696	108	8.7
2P+2D 2026.05		30,567	1,216	16.2
EB+ 2026.05		29,763	110	8.9
1P+3D 2026.05		29,495	715	8.5
2P+2D 2026.05		24,820	438	10.6
EB+ 2026.05		19,336	44	7.2
1P+3D 2026.05		18,861	233	7.2
v1 2026.05		18,414	319	28.2
EB+ 2026.05		18,396	284	28.2
v1 2026.05		18,245	46	7.6
3P+1D 2026.05		17,544	1,720	29.4
2P+2D 2026.05		17,227	179	7.9
1P+3D 2026.05		17,224	1,842	28.8
3P+1D 2026.05		16,682	599	16
3P+1D 2026.05		13,536	220	10.2
2P+2D 2026.05		13,169	1,074	39.1
EB+ 2026.05		12,699	119	21.3
v1 2026.05		12,678	125	21.4
1P+3D 2026.05		12,176	719	21.7
2P+2D 2026.05		10,595	404	25.4
v1 2026.05		7,723	65	18.1
EB+ 2026.05		7,690	62	18.2
3P+1D 2026.05		7,579	2,295	69.4
1P+3D 2026.05		7,469	223	18.6
3P+1D 2026.05		7,021	592	38.9
2P+2D 2026.05		6,848	166	20.4
3P+1D 2026.05		5,450	220	25.6