| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Compositional Vision-Language Reasoning | Winoground | Text Score89.5 | 47 | |
| Compositional Scene Understanding | Winoground | Text Alignment Score64 | 29 | |
| Image-Text Matching | Winoground | Text Agreement Score89.5 | 26 | |
| Compositional Reasoning | Winoground | Txt2Img Score40.25 | 21 | |
| Visual Question Answering | WinogroundVQA v1.0 (test) | Accuracy46.5 | 14 | |
| Fine-grained retrieval | Winoground (test) | Text Agreement (%)40 | 12 | |
| Image-text alignment | Winoground (test) | Text Score89.5 | 12 | |
| Fine-grained Image-Text Matching | Winoground | Group Agreement25.8 | 11 | |
| Vision-Language Reasoning | Winoground | Simple Acc59.88 | 9 | |
| Text-to-image retrieval | Winoground | R@1 (T2I)0.133 | 8 | |
| Vision-Language Compositional Reasoning | Winoground standard (test) | Text Score75.5 | 7 | |
| Text Selection | Winoground | Text Score34 | 7 | |
| Image Selection | Winoground | Image Score14 | 7 | |
| Image-Text Matching | Winoground 1.0 (full) | Text Agreement Score89.5 | 5 | |
| Vision-Language Reasoning | Winoground | Text Score30.5 | 4 | |
| Compositional Evaluation | Winoground Txt2Img | Txt2Img Score14 | 4 | |
| Vision-Language Compositional Reasoning | Winoground (test) | Object Score0.461 | 4 | |
| Image-Text Matching | Winoground clean | Text Agreement Score52.63 | 4 | |
| Image-Text Matching | Winoground (full) | Accuracy52.7 | 3 | |
| Vision-Language Compositional Reasoning | Winoground 1.0 (test) | Text Score42.5 | 3 | |
| Compositional Reasoning | Winoground (test) | Image Accuracy27 | 3 | |
| Paired-prompt evaluation | Winoground | Simple Accuracy58.81 | 2 | |
| Compositional Reasoning | Winoground clean 171 samples | Text Score31.58 | 2 | |
| Vision-Language Compositional Reasoning | Winoground clean (no-tag) | Text Score32.16 | 2 |