Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
vLLM Inference Performance on Qwen3-32B
Loading...
7.641
Model Load Time (s)
Safetensors
3.47264
31.60907
59.7455
87.88193
Dec 4, 2025
Model Load Time (s)
First Token Latency (s)
Throughput (tok/s)
CPU Memory (MiB)
Activation Memory (GiB)
Updated 4d ago
Evaluation Results
Method
Method
Links
Model Load Time (s)
First Token Latency (s)
Throughput (tok/s)
CPU Memory (MiB)
Activation Memory (GiB)
Safetensors
2025.12
7.641
218.374
43.09
12,752.61
61.57
CryptoTensors
Encryption=Unencrypted
2025.12
7.668
217.488
42.15
12,757.07
61.57
CryptoTensors
Encryption=Encrypted
2025.12
111.85
218.584
42.93
12,795.65
61.57
Feedback
Search any
task
Search any
task