| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-turn visual dialogue | MMDU 45K | Accuracy3.79 | 18 | |
| Multi-image Dialogue Understanding | MMDU | Accuracy26.37 | 12 | |
| Multimodal dialogue understanding | MMDU | GPT-4o Score0.703 | 10 | |
| Multi-turn Multi-image Dialog | MMDU | Accuracy66.3 | 4 |