Share your thoughts, 1 month free Claude Pro on usSee more

LLM Inference on Qwen 1.5B Instruct 2.5

155.3Throughput (tok/s)

CUDA (eager, RTX 5090)

Updated 13d ago

Evaluation Results

Method	Links
CUDA (eager, RTX 5090) 2026.02		155.3	154.9	0.6	-	1
MPS (Apple M2) 2026.02		20.6	20.4	2.9	-	0.13
torch-webgpu (fused, RTX 5090) 2026.02		17.9	17.7	3.8	51.3	0.12
torch-webgpu (unfused, RTX 5090) 2026.02		10.4	10.4	0.9	87.9	0.07