Share your thoughts, 1 month free Claude Pro on usSee more

Home/Benchmarks

Inference Latency on VLM prefill 1024 tokens

92.7Latency (ms)

W4A8

Updated 3mo ago

Evaluation Results

Method	Links
W4A8 2024.12		92.7
FP16 2024.12		109

SOTA Paper

W4A8

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Dataset

VLM prefill

Follow for update

@wizwand_team Discord

Related Benchmarks

Optical Character Recognition on OCRBench Multimodal Understanding and Reasoning on MMMU, SEED, OCRBench, VizWiz, ScienceQA, and TextVQA (test/val)Multimodal Understanding on SEED (Metrics: SEED, Average)Multimodal Understanding on MMMU 1.0 (test)Inference Latency on VLM decode average

© 2026 wizwand

Blog Contact Changelog Swarm

Privacy Policy Terms of Service FAQs Swarm Docs