| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Compositional Reasoning | SugarCrepe | Overall Accuracy87.5 | 50 | |
| Compositional Evaluation | SugarCrepe swap att (test) | Accuracy82.1 | 27 | |
| Compositional Evaluation | SugarCrepe | Add Score94.2 | 21 | |
| Image-Text Compositionality Evaluation | SugarCrepe ++ (test) | Replace ITT79.7 | 21 | |
| Language Compositionality | SugarCrepe (test) | Replace: Object (R@1)100 | 21 | |
| Vision-Language Compositionality | SugarCrepe | Accuracy88.06 | 20 | |
| Vision-Language Compositional Reasoning | SugarCrepe++ | Accuracy66.24 | 20 | |
| Compositional Evaluation | SugarCrepe (test) | Replace (Object)95.52 | 20 | |
| Image-Text Matching | SugarCrepe | AURC16.7 | 17 | |
| Text-to-Image Compositional Understanding | SugarCrepe++ T2I | Accuracy61.05 | 15 | |
| Compositional Understanding | SugarCrepe | Accuracy89.23 | 15 | |
| Attribute-binding | SugarCrepe++ | Replace-I2T79.8 | 11 | |
| Attribute-binding | SugarCrepe | Replace Accuracy89.5 | 11 | |
| Visual Question Answering | SugarCrepe | Simple Accuracy82.14 | 9 | |
| Compositional Image-Text Matching | SugarCrepe | Replacement Score88.7 | 9 | |
| Compositional Reasoning | SUGARCREPE (test) | Accuracy86.3 | 8 | |
| Compositional Reasoning | SugarCrepe 1.0 (test) | Replace Acc (Object)100 | 8 | |
| Language Compositionality | SugarCrepe 1.0 (test) | Recall@1 (Replace, Object)88.1 | 8 | |
| Image-to-text retrieval | SugarCrepe | R@1 (Add)73.8 | 8 | |
| Compositional Reasoning | SugarCrepe++ | Replace I2T79.7 | 7 | |
| Vision-Language Reasoning | SugarCrepe (test) | Simple Accuracy62.75 | 7 | |
| Image-Caption Alignment | SugarCrepe (test) | Replace Object96.9 | 7 | |
| Hard-negative classification | SugarCrepe | Replace: Object Accuracy91.38 | 6 | |
| Hallucination Reasoning | SugarCrepe | Accuracy86.4 | 5 | |
| Vision-Language Alignment | Sugarcrepe swap-object | Accuracy63.8 | 4 |