Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
VLM Inference Latency on Qualcomm Snapdragon 8 Gen 3 SoC llama.cpp quantization
Loading...
6.82
VE Latency (ms/patch)
MobileVLM-336
6.7556
7.1903
7.625
8.0597
Dec 28, 2023
VE Latency (ms/patch)
Sample Throughput (tokens/s)
Prompt Evaluation Throughput (tokens/s)
Generation Throughput (tokens/s)
Total Inference Time (s)
Updated 3d ago
Evaluation Results
Method
Method
Links
VE Latency (ms/patch)
Sample Throughput (tokens/s)
Prompt Evaluation Throughput (tokens/s)
Generation Throughput (tokens/s)
Total Inference Time (s)
MobileVLM-336
Language Model=MobileL...
2023.12
6.82
34,892
34.93
21.54
18.51
LLaVA-v1.5-336
Language Model=TinyLLa...
2023.12
7.77
31,370
41.7
18.4
20.7
LLaVA-v1.5-336
Language Model=OpenLLa...
2023.12
7.98
27,530
8.95
7.22
84.43
LLaVA-v1.5-336
Language Model=Vicuna...
2023.12
8.23
17,347
5.36
0.25
329.89
MobileVLM-336
Language Model=MobileL...
2023.12
8.43
27,660
18.36
12.21
33.1
Feedback
Search any
task
Search any
task