| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-image reasoning | Mantis (test) | Accuracy72.81 | 39 | |
| Adversarial Attack | Mantis Eval | Attack Success Rate84.57 | 37 | |
| Multi-image Reasoning | Mantis | Accuracy71 | 18 | |
| Visual Question Answering | Mantis Eval | ASR71.32 | 12 | |
| Multimodal Reasoning | Mantis-Eval | Accuracy59.23 | 11 | |
| Interleaved Image Multimodal Understanding | Mantis | Score64.2 | 7 | |
| Multi-image Visual Question Answering | Mantis | Accuracy76.5 | 4 | |
| Multi-image in the Wild | Mantis | Accuracy77.6 | 4 | |
| Intent Prediction | MANTIS | AP77.1 | 4 | |
| Multi-image Multi-modal Understanding | Mantis | Accuracy65.4 | 2 |