Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MLLM Evaluation Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multimodal Large Language Model EvaluationMLLM Evaluation Suite
Average Score (All)56.7
22
Multimodal Question AnsweringMLLM Evaluation Suite (GQA, MMB, MMB-CN, MME, POPE, SQA, VQAv2, VQA-Text, VizWiz) (test)
GQA Accuracy64.27
11
Multimodal Question AnsweringMLLM Evaluation Suite (HallBench, MME, TextVQA, ChartQA, AI2D, RealWorldQA, CCBench, OCRVQA, SQA-IMG, POPE) (test)
HallBench49.8
7
Multimodal UnderstandingMLLM Evaluation Suite (GQA, MMBench, MME, POPE, ScienceQA, VQA v2, HRBench-8k, XLRS)
GQA Score60.9
7
Multimodal UnderstandingMLLM Evaluation Suite (GQA, MMBench, MME, POPE, ScienceQA, VQAv2, MMMU, SEED-I) LLaVA-NeXT (test)
GQA Accuracy64.2
7
Multimodal UnderstandingMLLM Evaluation Suite (GQA, MMB, MME, POPE, SQA, VQAv2, VQAText, MMMU, SEED-I, VizWiz)
GQA Score65.4
4
Multimodal Large Language Model EvaluationMLLM Evaluation Suite (MME, MMStar, SQA, RealWorldQA, MMMU, MMMU-P, VisuLogic, LogicVista, CRPE, POPE, HallBench) (test)
MME74.94
4
Showing 7 of 7 rows