| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-image Understanding | MMIU | Accuracy55.8 | 65 | |
| Visual Question Answering | MMIU | Accuracy71 | 19 | |
| Multi-Image Understanding | MMIU 106 (test) | Score72.1 | 19 | |
| Narrative Reasoning | MMIU (test) | BLEURT Score0.306 | 14 | |
| Multi-image Understanding | MMIU (test) | Accuracy52.6 | 11 | |
| Image Understanding | MMIU | MMIU Score40.2 | 7 | |
| Visual Quality Assessment | MMIU visual quality | Accuracy53 | 3 | |
| Text-to-Image Retrieval | MMIU text2image_retrieval | Accuracy25.2 | 3 | |
| Emotion Recognition | MMIU emotion_findingemo | Accuracy26.9 | 3 | |
| Emotion Recognition | MMIU emotion_expw | Accuracy31.8 | 3 | |
| Forensic Detection | MMIU forensic_blink | Accuracy30.9 | 3 | |
| Forensic Detection | MMIU forensic_forgerynet | Accuracy87.4 | 3 |