Share your thoughts, 1 month free Claude Pro on usSee more

Inference Latency on VLM decode average

21.1Latency (ms)

W3A16

Updated 3mo ago

Evaluation Results

Method	Links
W3A16 2024.12		21.1
W4A8 2024.12		26.3
FP16 2024.12		29.6