Share your thoughts, 1 month free Claude Pro on usSee more

LLM Inference on Qwen 0.5B Instruct 2.5

185.5Throughput (Tok/s)

CUDA (compiled, RTX 5090)

Updated 13d ago

Evaluation Results

Method	Links
CUDA (compiled, RTX 5090) 2026.02		185.5	184.2	0.9	5.4	1
CUDA (eager, RTX 5090) 2026.02		182.9	182.3	0.4	5.5	0.99
MPS (Apple M2) 2026.02		47.8	47.7	0.9	20.9	0.26
torch-webgpu (fused, RTX 5090) 2026.02		21	20.7	4	41.6	0.11
CPU (AMD Ryzen, eager) 2026.02		13.7	13.4	3.2	72.8	0.07
ONNX Runtime (WebGPU, RTX 5090) 2026.02		13.1	13	1.1	73.5	0.07