| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Embodied Reasoning and Question Answering | ERQA | Score65 | 35 | |
| Spatial Reasoning (Multi-Image) | ERQA | Accuracy51.02 | 23 | |
| Multimodal Reasoning | ERQA | Accuracy55 | 22 | |
| Embodied Visual Question Answering | ERQA | Accuracy59 | 19 | |
| Embodied reasoning | ERQA (test) | Accuracy70.25 | 12 | |
| Embodied Reasoning | ERQA | Accuracy54.5 | 11 | |
| Embodied Reasoning | ERQA (train) | Success Rate (SR)61.33 | 7 | |
| General | ERQA | Score41.6 | 4 | |
| Multimodal Understanding | Erqa | Accuracy51.3 | 3 | |
| Ego-centric Spatial Reasoning | ERQA | Accuracy36.2 | 2 |