| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Perception | BLINK | Accuracy72.7 | 122 | |
| Visual Reasoning | BLINK | Accuracy85.2 | 76 | |
| Adversarial Attack | BLINK | Attack Success Rate (ASR)87.65 | 37 | |
| Visual Reasoning | BLINK | Jigsaw Accuracy99 | 29 | |
| Visual Question Answering | BLINK (val) | Accuracy73.7 | 29 | |
| Visual Perception | BLINK (val) | Validation Score95.67 | 29 | |
| Spatial Reasoning | BLINK | Spa. Score88.11 | 26 | |
| Multi-image visual perception | BLINK | Accuracy62.8 | 26 | |
| Multi-image Understanding | BLINK (val) | Score68 | 23 | |
| Interleaved Image Multimodal Understanding | BLINK | Score66.3 | 22 | |
| Visual Understanding | BLINK | Accuracy69.86 | 21 | |
| Spatial Reasoning | BLINK | Score69.1 | 21 | |
| Multi-image reasoning | BLINK (val) | Accuracy52.6 | 21 | |
| Low-level Visual Reasoning | BLINK | Accuracy72.3 | 19 | |
| Visual Perception | Blink 41 (val) | Score87.4 | 19 | |
| Relative Depth Estimation | BLINK RelativeDepth (test) | Accuracy87.9 | 18 | |
| 3D Spatial Reasoning | BLINK | Accuracy60 | 16 | |
| Multimodal Multi-choice | BLINK | Accuracy60 | 15 | |
| Spatial Reasoning | BLINK Multi-view (test) | Accuracy63.91 | 15 | |
| Multimodal Reasoning | BLINK | Accuracy56.4 | 15 | |
| Visual Understanding | BLINK sub-tasks | Jigsaw Accuracy90.67 | 14 | |
| Visual Reasoning | BLINK-J | Accuracy88 | 14 | |
| Visual Question Answering | BLINK Relative-Depth | Accuracy83.1 | 12 | |
| Visual Question Answering | BLINK Spatial-Relation | Accuracy87.4 | 12 | |
| Compositional Reasoning | BLINK | Accuracy68 | 12 |