| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Compositional Reasoning | SugarCrepe | Overall Accuracy87.5 | 43 | |
| Language Compositionality | SugarCrepe (test) | Replace: Object (R@1)100 | 21 | |
| Vision-Language Compositionality | SugarCrepe | Accuracy88.06 | 20 | |
| Compositional Evaluation | SugarCrepe (test) | Replace (Object)95.52 | 20 | |
| Image-Text Matching | SugarCrepe | AURC16.7 | 17 | |
| Image-Text Compositionality Evaluation | SugarCrepe ++ (test) | Swap Object ITT100 | 17 | |
| Compositional Evaluation | SugarCrepe swap att (test) | Accuracy82.1 | 13 | |
| Visual Question Answering | SugarCrepe | Simple Accuracy82.14 | 9 | |
| Compositional Image-Text Matching | SugarCrepe | Replacement Score88.7 | 9 | |
| Compositional Reasoning | SugarCrepe 1.0 (test) | Replace Acc (Object)100 | 8 | |
| Language Compositionality | SugarCrepe 1.0 (test) | Recall@1 (Replace, Object)88.1 | 8 | |
| Image-to-text retrieval | SugarCrepe | R@1 (Add)73.8 | 8 | |
| Vision-Language Reasoning | SugarCrepe (test) | Simple Accuracy62.75 | 7 | |
| Image-Caption Alignment | SugarCrepe (test) | Replace Object96.9 | 7 | |
| Hallucination Reasoning | SugarCrepe | Accuracy86.4 | 5 | |
| Vision-Language Compositional Reasoning | SugarCrepe++ | Accuracy66.1 | 5 | |
| Image-Text Matching | SugarCREPE sampled balanced | Accuracy58.7 | 3 | |
| Paired-prompt evaluation | SugarCrepe | Simple Accuracy64.56 | 2 |