| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Compositional Understanding | ARO | Relational Score86.7 | 27 | |
| Compositional Reasoning | ARO | Relation Score83.6 | 17 | |
| Vision-Language Compositional Reasoning | ARO | Accuracy0.804 | 14 | |
| Compositional Evaluation | ARO-A (test) | Accuracy84.49 | 13 | |
| Order Sensitivity | ARO | Flickr30K Order99.4 | 13 | |
| Relational Understanding | ARO | Relation Accuracy59 | 6 | |
| Compositional Attribution | ARO Attribution | Accuracy77.1 | 4 | |
| Compositional Retrieval | ARO Retrieval | Accuracy83.6 | 4 | |
| Compositional Reasoning | ARO (test) | Relation Score83.7 | 4 | |
| Vision-Language Compositional Reasoning | ARO (test) | VG-Rel71.4 | 4 |