| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-image reasoning | Mantis (test) | Accuracy72.81 | 39 | |
| Multi-image Reasoning | Mantis | Accuracy81.71 | 38 | |
| Adversarial Attack | Mantis Eval | Attack Success Rate84.57 | 37 | |
| Interleaved Image Multimodal Understanding | Mantis | Score64.2 | 17 | |
| Visual Question Answering | Mantis Eval | ASR71.32 | 12 | |
| Multimodal Reasoning | Mantis-Eval | Accuracy59.23 | 11 | |
| Visual Question Answering | Mantis | Accuracy81.57 | 6 | |
| Multi-Image Understanding | Mantis (Eval) | Score79.7 | 5 | |
| Multi-image Visual Question Answering | Mantis | Accuracy76.5 | 4 | |
| Multi-image in the Wild | Mantis | Accuracy77.6 | 4 | |
| Intent Prediction | MANTIS | AP77.1 | 4 | |
| Multi-image Multi-modal Understanding | Mantis | Accuracy65.4 | 2 |