Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multimodal Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multimodal Understanding and Question AnsweringMultimodal Benchmarks MME, OCRBench, DocVQA, RealWorldQA, VLMBlind
MME Score2,386
33
Multimodal Question Answering9 Multimodal Benchmarks (VQAv2, GQA, VizWiz, SQA-IMG, TextVQA, POPE, MME, MMB, MMB-CN) (test val)
VQAv2 Accuracy80
15
Multimodal In-context LearningMultimodal Benchmarks Average
Accuracy67.2
9
Showing 3 of 3 rows