| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Concept Attribution | CLEVR (test) | F1 Score0.85 | 160 | |
| Visual Question Answering | CLEVR (test) | Overall Accuracy99.8 | 61 | |
| Visual Question Answering | CLEVR 1.0 (test) | Overall Accuracy99.6 | 46 | |
| Visual Question Answering | CLEVR Style-transferred Target | Accuracy95.9 | 24 | |
| Image Classification | CLEVR | Accuracy76.2 | 22 | |
| Visual Question Answering | CLEVR-Humans 1.0 (test) | Accuracy85.5 | 22 | |
| Concept Attribution | CLEVR | Avg Attribution F151 | 18 | |
| Customized few-shot classification | Clevr 4 10k | Shape Accuracy98.69 | 18 | |
| Unsupervised Object Segmentation | CLEVR 1.0 (test) | FG-ARI95.94 | 16 | |
| Visual Question Answering | CLEVR (val) | Overall Accuracy100 | 15 | |
| Shifted rightmost object color inference | CLEVR | Accuracy (Shifted Rightmost Color)62.06 | 13 | |
| Third object from right color inference | CLEVR | Accuracy41.95 | 13 | |
| Rightmost object material inference | CLEVR | Accuracy86.84 | 13 | |
| Rightmost object shape inference | CLEVR | Accuracy70.03 | 13 | |
| Rightmost object size inference | CLEVR | Accuracy93.22 | 13 | |
| Bottommost object color inference (BC) | CLEVR | BC Accuracy84.09 | 13 | |
| Leftmost object color inference (LC) | CLEVR | Accuracy81.79 | 13 | |
| Rightmost object color inference (RC) | CLEVR | Accuracy98.75 | 13 | |
| Image Generation | CLEVR | FID0.81 | 13 | |
| Single-object retrieval | CLEVR Cola single-object compounds | mAP (All)91.1 | 12 | |
| Visual Question Answering | CLEVR-CoGenT (val) | Accuracy99.8 | 12 | |
| Classification rule reverse engineering | CLEVR 1.0 (test) | CaCE0.717 | 11 | |
| Visual Reasoning | CLEVR 1.0 (test) | Overall Accuracy97.7 | 11 | |
| Customized Clustering | Clevr 4 10k | Texture NMI14.11 | 10 | |
| Semantic Retrieval | Clevr v1 (test) | Avg Cost per Query0 | 10 |