Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
vLLM Inference Performance on Qwen3-8B
Loading...
1.732
Model Load Time (s)
Safetensors
0.71904
7.55652
14.394
21.23148
Dec 4, 2025
Model Load Time (s)
First Token Latency (s)
Throughput (tok/s)
CPU Mem (MiB)
Acc Mem (GiB)
Updated 4d ago
Evaluation Results
Method
Method
Links
Model Load Time (s)
First Token Latency (s)
Throughput (tok/s)
CPU Mem (MiB)
Acc Mem (GiB)
Safetensors
2025.12
1.732
122.039
115.87
11,095.98
15.3
CryptoTensors
Encryption=Unencrypted
2025.12
1.806
122.563
115.61
11,108.98
15.3
CryptoTensors
Encryption=Encrypted
2025.12
27.056
123.134
112.85
11,233.99
15.3
Feedback
Search any
task
Search any
task