Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Inference on Qwen 1.5B Instruct 2.5
Loading...
155.3
Throughput (tok/s)
CUDA (eager, RTX 5090)
4.604
43.727
82.85
121.973
Feb 9, 2026
Throughput (tok/s)
Throughput 95% CI
Coefficient of Variation (CV)
Time To First Token (ms)
Relative Performance vs CUDA
Updated 13d ago
Evaluation Results
Method
Method
Links
Throughput (tok/s)
Throughput 95% CI
Coefficient of Variation (CV)
Time To First Token (ms)
Relative Performance vs CUDA
CUDA (eager, RTX 5090)
Backend=CUDA (eager, R...
2026.02
155.3
154.9
0.6
-
1
MPS (Apple M2)
Backend=MPS (Apple M2)...
2026.02
20.6
20.4
2.9
-
0.13
torch-webgpu (fused, RTX 5090)
Backend=torch-webgpu (...
2026.02
17.9
17.7
3.8
51.3
0.12
torch-webgpu (unfused, RTX 5090)
Backend=torch-webgpu (...
2026.02
10.4
10.4
0.9
87.9
0.07
Feedback
Search any
task
Search any
task