| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Understanding | Standard Multimodal Evaluation Suite (GQA, MMBench, MME, TextVQA, ScienceQA, VQA v2) 1.5 (test val) | GQA Score63.2 | 32 | |
| Multimodal Question Answering and Understanding | Standard Multimodal Evaluation Suite GQA, MMB, MME, VQA-T, SQA-I, VQA-v2, POPE, MMMU, MM-Vet | GQA Accuracy61.9 | 26 |