Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VLM Evaluation Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multimodal UnderstandingVLM Evaluation Suite Hall, MME, AI2D, RWQA, SQA, POPE, MBen, MBzh, CCB, VSR, V7W
Hall63.76
40
Video GenerationVLM Evaluation Suite
Aesthetic Appeal8.25
8
Multimodal UnderstandingVLM Evaluation Suite (GQA, MMB, MMBCN, MME, POPE, SQA, VQAv2, VQAText) LLaVA-NEXT-7B (test)
GQA Accuracy64.2
7
Showing 3 of 3 rows