Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
vLLM Inference Performance on Qwen3-4B
Loading...
1.101
Model Load Time (s)
Safetensors
0.66832
3.58891
6.5095
9.43009
Dec 4, 2025
Model Load Time (s)
First Token Latency (s)
Throughput (tok/s)
CPU Mem (MiB)
Acc Mem (GiB)
Updated 4d ago
Evaluation Results
Method
Method
Links
Model Load Time (s)
First Token Latency (s)
Throughput (tok/s)
CPU Mem (MiB)
Acc Mem (GiB)
Safetensors
2025.12
1.101
123.277
126.13
11,075.59
7.58
CryptoTensors
Encryption=Unencrypted
2025.12
1.104
123.406
125.23
11,088.67
7.58
CryptoTensors
Encryption=Encrypted
2025.12
11.918
123.467
126.84
11,608.82
7.58
Feedback
Search any
task
Search any
task