| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | CLEVR-CoGenT (Condition A) | Accuracy99.8 | 21 | |
| Visual Question Answering | CLEVR-CoGenT Condition B | Accuracy98.2 | 18 | |
| Visual Question Answering | CLEVR-CoGenT systematic generalization 1.0 (test) | Accuracy77.3 | 3 | |
| Vision-Language Reasoning | CLEVR-CoGenT (Split A) | Accuracy99.7 | 3 | |
| Visual Question Answering | CLEVR-CoGenT (test) | Accuracy (Condition A)98.8 | 3 |