Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Inference on Qwen 0.5B Instruct 2.5
Loading...
185.5
Throughput (Tok/s)
CUDA (compiled, RTX 5090)
6.204
52.752
99.3
145.848
Feb 9, 2026
Throughput (Tok/s)
95% CI (Throughput Lower Bound)
Coefficient of Variation (CV)
Time To First Token (ms)
Speedup vs CUDA
Updated 13d ago
Evaluation Results
Method
Method
Links
Throughput (Tok/s)
95% CI (Throughput Lower Bound)
Coefficient of Variation (CV)
Time To First Token (ms)
Speedup vs CUDA
CUDA (compiled, RTX 5090)
Backend=CUDA (compiled...
2026.02
185.5
184.2
0.9
5.4
1
CUDA (eager, RTX 5090)
Backend=CUDA (eager, R...
2026.02
182.9
182.3
0.4
5.5
0.99
MPS (Apple M2)
Backend=MPS (Apple M2)...
2026.02
47.8
47.7
0.9
20.9
0.26
torch-webgpu (fused, RTX 5090)
Backend=torch-webgpu (...
2026.02
21
20.7
4
41.6
0.11
CPU (AMD Ryzen, eager)
Backend=CPU (AMD Ryzen...
2026.02
13.7
13.4
3.2
72.8
0.07
ONNX Runtime (WebGPU, RTX 5090)
Backend=ONNX Runtime (...
2026.02
13.1
13
1.1
73.5
0.07
Feedback
Search any
task
Search any
task