VLM Evaluation Suite

Benchmarks

Task Name	Dataset Name	SOTA Result
Multimodal Understanding	VLM Evaluation Suite Hall, MME, AI2D, RWQA, SQA, POPE, MBen, MBzh, CCB, VSR, V7W	Hall63.76	40
Video Generation	VLM Evaluation Suite	Aesthetic Appeal8.25	8
Multimodal Understanding	VLM Evaluation Suite (GQA, MMB, MMBCN, MME, POPE, SQA, VQAv2, VQAText) LLaVA-NEXT-7B (test)	GQA Accuracy64.2	7

Showing 3 of 3 rows